Sound cs.sd - Open Access .click

Results 31 to 40 of about 1,013,210 (107)

Few-Shot Bioacoustic Event Detection with Frame-Level Embedding Learning System

This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as ...
Lu, ChengWei, Zhao, PengYuan, Zou, Liang
core

METAMAT 01: A semi-analytic Solution for Benchmarking Wave Propagation Simulations of homogeneous Absorbers in 1D/3D and 2D

The development of acoustic simulation workflows in the time-domain description is essential for predicting the sound of aeroacoustic or other transient acoustic effects. A common practice for noise mitigation is using absorbers.
Maurerlehner, Paul, Schoder, Stefan
core

Explainability Paths for Sustained Artistic Practice with AI

The development of AI-driven generative audio mirrors broader AI trends, often prioritizing immediate accessibility at the expense of explainability. Consequently, integrating such tools into sustained artistic practice remains a significant challenge ...
Peschlow, Thomas, Tecks, Austin, Vigliensoni, Gabriel +2 more
core

The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023

In this paper, we propose a solution for improving the quality of temporal sound localization. We employ a multimodal fusion approach to combine visual and audio features.
Chen, Qingguo +5 more
core

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Recently, there has been an increasing focus on audio-text cross-modal learning. However, most of the existing audio-text datasets contain only simple descriptions of sound events.
Wu, Mengyue +5 more
core

Microphone Conversion: Mitigating Device Variability in Sound Event Classification

In this study, we introduce a new augmentation technique to enhance the resilience of sound event classification (SEC) systems against device variability through the use of CycleGAN. We also present a unique dataset to evaluate this method.
Lee, Suji +3 more
core

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis

Expressive speech synthesis aims to generate speech that captures a wide range of para-linguistic features, including emotion and articulation, though current research primarily emphasizes emotional aspects over the nuanced articulatory features mastered
Chen, Meiying Melissa +4 more
core

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection

Overlapping sound events are ubiquitous in real-world environments, but existing end-to-end sound event detection (SED) methods still struggle to detect them effectively.
Guan, Yadong +6 more
core

SoundLoCD: An Efficient Conditional Discrete Contrastive Latent Diffusion Model for Text-to-Sound Generation

We present SoundLoCD, a novel text-to-sound generation framework, which incorporates a LoRA-based conditional discrete contrastive latent diffusion model.
Martin, Charles Patrick +3 more
core

Towards Privacy-Preserving Audio Classification Systems

Audio signals can reveal intimate details about a person's life, including their conversations, health status, emotions, location, and personal preferences.
Chhaglani, Bhawana, Gummeson, Jeremy, Shenoy, Prashant +2 more
core

engineering
physics
environmental science

computer science
materials science
art

previous 2 3 4 5 6 next