Results 31 to 40 of about 1,013,210 (107)
Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System
This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as ...
Lu, ChengWei, Zhao, PengYuan, Zou, Liang
core
The development of acoustic simulation workflows in the time-domain description is essential for predicting the sound of aeroacoustic or other transient acoustic effects. A common practice for noise mitigation is using absorbers.
Maurerlehner, Paul, Schoder, Stefan
core
Explainability Paths for Sustained Artistic Practice with AI
The development of AI-driven generative audio mirrors broader AI trends, often prioritizing immediate accessibility at the expense of explainability. Consequently, integrating such tools into sustained artistic practice remains a significant challenge ...
Peschlow, Thomas +2 more
core
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features.
Chen, Qingguo +5 more
core
A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds
Recently, there has been an increasing focus on audio-text cross-modal learning. However, most of the existing audio-text datasets contain only simple descriptions of sound events.
Wu, Mengyue +5 more
core
Microphone Conversion: Mitigating Device Variability in Sound Event Classification
In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method.
Lee, Suji +3 more
core
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis
Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered
Chen, Meiying Melissa +4 more
core
Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively.
Guan, Yadong +6 more
core
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model.
Martin, Charles Patrick +3 more
core
Towards Privacy-Preserving Audio Classification Systems
Audio signals can reveal intimate details about a person's life, including their conversations, health status, emotions, location, and personal preferences.
Chhaglani, Bhawana +2 more
core

