Beyond Equal-Length Snippets: How Long is Sufficient to Recognize an Audio Scene? [PDF]
Due to the variability in characteristics of audio scenes, some scenes can naturally be recognized earlier than others. In this work, rather than using equal-length snippets for all scene categories, as is common in the literature, we study to which temporal extent an audio scene can be reliably recognized given state-of-the-art models.
arxiv
Auto-adaptive Resonance Equalization using Dilated Residual Networks [PDF]
In music and audio production, attenuation of spectral resonances is an important step towards a technically correct result. In this paper we present a two-component system to automate the task of resonance equalization. The first component is a dynamic equalizer that automatically detects resonances and offers to attenuate them by a user-specified ...
arxiv
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio? [PDF]
Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society.
arxiv
Text2FX: Harnessing CLAP Embeddings for Text-Guided Audio Effects [PDF]
This work introduces Text2FX, a method that leverages CLAP embeddings and differentiable digital signal processing to control audio effects, such as equalization and reverberation, using open-vocabulary natural language prompts (e.g., "make this sound in-your-face and bold").
arxiv
An RFP dataset for Real, Fake, and Partially fake audio detection [PDF]
Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models.
arxiv
A Joint Detection-Classification Model for Audio Tagging of Weakly Labelled Data [PDF]
Audio tagging aims to assign one or several tags to an audio clip. Most of the datasets are weakly labelled, which means only the tags of the clip are known, without knowing the occurrence time of the tags. The labeling of an audio clip is often based on the audio events in the clip and no event level label is provided to the user.
arxiv
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework [PDF]
Diffusion-based text-to-audio (TTA) generation has made substantial progress, leveraging latent diffusion model (LDM) to produce high-quality, diverse and instruction-relevant audios. However, beyond generation, the task of audio editing remains equally important but has received comparatively little attention.
arxiv
Synthetic Audio Helps for Cognitive State Tasks [PDF]
The NLP community has broadly focused on text-only approaches of cognitive state tasks, but audio can provide vital missing cues through prosody. We posit that text-to-speech models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that
arxiv
Design And Implementation Of A Linear-phase Equalizer In Digital Audio Signal Processing [PDF]
Cornelis H. Slump+8 more
openalex +1 more source
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio [PDF]
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for generalized detection methods. ALM-based deepfake audio currently exhibits widespread, high deception, and type versatility, posing a significant challenge to current audio deepfake detection (ADD) models trained solely on vocoded data. To effectively
arxiv