Academic interests
AMBIENT project
Thesis (working) title: Machine Synchresis: Investigating Immersive Audio-Visual Rhythms and Environments with Multi-Modal Information Retrieval
Background
- M.Mus in Music Technology, Steinhardt School, New York University, New York, NY, USA
- B.Sc in Information Engineering, Department of Electronic and Electric Engineering, Southern University of Science and Technology, Shenzhen, China
Tags:
Music Information Retrieval,
Machine Learning,
Multimodal Learning,
Neural Audio Synthesis
Publications
-
Riaz, Maham; Guo, Jinyue; Erdem, Cagri & Jensenius, Alexander Refsum
(2025).
Where to Put That Microphone? A Study of Sound Localization in Ambisonics Recordings.
In McArthur, Angela; Matthews, Emma-Kate & Holberton, Tom (Ed.),
Proceedings of the 17th International Symposium on Computer Music Multidisciplinary Research.
The Laboratory PRISM “Perception, Representations, Image, Sound, Music”.
ISSN 9791097498061.
p. 455–466.
doi:
10.5281/ZENODO.17497086.
Show summary
This paper examines the effects of microphone placement on sound localization in first-order Ambisonics recordings. Two microphone setups were used to capture a moving audio source in a lab environment. Array A, a tetrahedral microphone, was placed in the centre of the recording space. Array B consisted of four similar tetrahedral microphones charting a rectangular perimeter surrounding the space. Motion capture data of the moving sound source shows that anglegrams calculated from the Ambisonics recordings can be effectively used for sound localization. An additional perceptual listening study with binaural renders of the audio signals showed that the centrally-placed Array A provided superior localization. However, the corner-placed Array B performed better than expected.
-
Guo, Jinyue; T?rresen, Jim & Jensenius, Alexander Refsum
(2025).
Cross-modal Analysis of Spatial-Temporal Auditory Stimuli and Human Micromotion when Standing Still in Indoor Environments.
In McArthur, Angela; Matthews, Emma-Kate & Holberton, Tom (Ed.),
Proceedings of the 17th International Symposium on Computer Music Multidisciplinary Research.
The Laboratory PRISM “Perception, Representations, Image, Sound, Music”.
ISSN 9791097498061.
p. 871–882.
doi:
10.5281/ZENODO.17502603.
Show summary
This paper examines how a soundscape influences human stillness. We are particularly interested in how spatial and temporal features of a soundscape influence human micromotion and swaying patterns. The analysis is based on 345 Ambisonics audio recordings of different indoor environments and corresponding accelerometer data captured at the chest of a person standing still for ten minutes. We calculated the temporal and spatial correlation between the person's quantity of motion and the sound energy of the Ambisonic recordings. While no clear temporal correlations were found, we discovered a correlation between the spatial directionality of the micromotion and the sound direction of arrival. The results suggest a potential entrainment between the directionality of environmental sounds and human swaying patterns, which have not been thoroughly studied previously compared to the temporal or spectral features of indoor soundscapes.
-
Riaz, Maham; Guo, Jinyue; G?ksülük, Bilge Serdar & Jensenius, Alexander Refsum
(2025).
Where is That Bird? The Impact of Artificial Birdsong in Public
Indoor Environments.
In Sei?a, Mariana & Wirfs-Brock, Jordan (Ed.),
AM '25: Proceedings of the 20th International Audio Mostly Conference.
Association for Computing Machinery (ACM).
ISSN 9798400708183.
Show summary
This paper explores the effects of nature sounds, specifically bird sounds, on human experience and behavior in indoor public environments. We report on an intervention study where we introduced an interactive sound device to alter the soundscape. Phenomenological observations and a survey showed that participants noticed and engaged with the bird sounds primarily through causal listening; that is, they attempted to identify the sound source. Participants generally responded positively to the bird sounds, appreciating the calmness and surprise it brought to the environment. The analyses revealed that relative loudness was a key factor influencing the experience. A too-high sound level may feel unpleasant, while a too-low sound level makes it unnoticeable due to background noise. These findings highlight the importance of automatic level adjustments and considering acoustic conditions in soundscape interventions. Our study contributes to a broader discourse on sound perception, human interaction with sonic spaces, and the potential of auditory design in public indoor environments.
-
Riaz, Maham; Guo, Jinyue & Jensenius, Alexander Refsum
(2025).
Comparing Spatial Audio Recordings from Commercially Available 360-degree Video Cameras.
In Brooks, Anthony L.; Banakou, Domna & Ceperkovic, Slavica (Ed.),
Proceedings of the 13th EAI International Conference on ArtsIT, Interactivity and Game Creation, ArtsIT 2024.
Springer.
ISSN 9783031972546.
p. 160–172.
doi:
10.1007/978-3-031-97254-6_12.
Show summary
This paper investigates the spatial audio recording capabilities of various commercially available 360-degree cameras (GoPro MAX, Insta360 X3, Garmin VIRB 360, and Ricoh Theta S). A dedicated ambisonics audio recorder (Zoom H3VR) was used for comparison. Six action sequences were performed around the recording setup, including impulsive and continuous vocal and non-vocal stimuli. The audio streams were extracted from the videos and compared using spectrograms and anglegrams. The anglegrams show adequate localization in ambisonic recordings from the GoPro MAX and Zoom H3VR. All cameras feature undocumented noise reduction and audio enhancement algorithms, use different types of audio compression, and have limited audio export options. This makes it challenging to use the spatial audio data reliably for research purposes.
-
Guo, Jinyue; Christodoulou, Anna-Maria; Laczko, Balint & Glette, Kyrre
(2024).
LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search.
In Li, Xiaodong & Handl, Julia (Ed.),
GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion.
Association for Computing Machinery (ACM).
ISSN 9798400704956.
p. 667–670.
doi:
10.1145/3638530.3654432.
Show summary
Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
-
Guo, Jinyue; Riaz, Maham & Jensenius, Alexander Refsum
(2024).
Comparing Four 360-Degree Cameras for Spatial Video Recording and Analysis,
Proceedings of the Sound and Music Computing Conference 2024.
SMC Network.
ISSN 9789893520758.
Full text in Research Archive
Show summary
This paper reports on a desktop investigation and a lab experiment comparing the video recording capabilities of four commercially available 360-degree cameras: GoPro MAX, Insta360 X3, Garmin VIRB 360, and Ricoh Theta S. The four cameras all use different recording formats and settings and have varying video quality and software support. This makes it difficult to conduct analyses and compare between devices. We have implemented new functions in the Musical Gestures Toolbox (MGT) for reading and merging files from the different platforms. Using the capabilities of FFmpeg, we have also made a new function for converting between different 360-degree video projections and formats. This allows (music) researchers to exploit 360-degree video recordings using regular video-based analysis pipelines.
-
Guo, Jinyue & McFee, Brian
(2023).
Automatic Recognition of Cascaded Guitar Effects.
In Serafin, Stefania; Fontana, Federico & Willemsen, Silvin (Ed.),
Proceedings of the 26th International Conference on Digital Audio Effects.
Aalborg University Copenhagen.
p. 189–195.
doi:
10.5281/zenodo.7973536.
Show summary
This paper reports on a new multi-label classification task for guitar effect recognition that is closer to the actual use case of guitar effect pedals. To generate the dataset, we used multiple clean guitar audio datasets and applied various combinations of 13 commonly used guitar effects. We compared four neural network structures: a simple Multi-Layer Perceptron as a baseline, ResNet models, a CRNN model, and a sample-level CNN model. The ResNet models achieved the best performance in terms of accuracy and robustness under various setups (with or without clean audio, seen or unseen dataset), with a micro F1 of 0.876 and Macro F1 of 0.906 in the hardest setup. An ablation study on the ResNet models further indicates the necessary model complexity for the task.
View all works in NVA
Published
Jan. 31, 2023 4:15 PM
- Last modified
Mar. 3, 2025 12:47 PM