Results 31 to 40 of about 1,013,210 (107)

Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System

open access: yes
This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as ...
Lu, ChengWei, Zhao, PengYuan, Zou, Liang
core  

METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

open access: yes
The development of acoustic simulation workflows in the time-domain description is essential for predicting the sound of aeroacoustic or other transient acoustic effects. A common practice for noise mitigation is using absorbers.
Maurerlehner, Paul, Schoder, Stefan
core  

Explainability Paths for Sustained Artistic Practice with AI

open access: yes
The development of AI-driven generative audio mirrors broader AI trends, often prioritizing immediate accessibility at the expense of explainability. Consequently, integrating such tools into sustained artistic practice remains a significant challenge ...
Peschlow, Thomas   +2 more
core  

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

open access: yes
In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features.
Chen, Qingguo   +5 more
core  

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

open access: yes
Recently, there has been an increasing focus on audio-text cross-modal learning. However, most of the existing audio-text datasets contain only simple descriptions of sound events.
Wu, Mengyue   +5 more
core  

Microphone Conversion: Mitigating Device Variability in Sound Event Classification

open access: yes
In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method.
Lee, Suji   +3 more
core  

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

open access: yes
Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered
Chen, Meiying Melissa   +4 more
core  

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

open access: yes
Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively.
Guan, Yadong   +6 more
core  

SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

open access: yes
We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model.
Martin, Charles Patrick   +3 more
core  

Towards Privacy-Preserving Audio Classification Systems

open access: yes
Audio signals can reveal intimate details about a person's life, including their conversations, health status, emotions, location, and personal preferences.
Chhaglani, Bhawana   +2 more
core  

Home - About - Disclaimer - Privacy