Results 31 to 40 of about 2,183,275 (319)
A joint separation-classification model for sound event detection of weakly labelled data [PDF]
Source separation (SS) aims to separate individual sources from an audio recording. Sound event detection (SED) aims to detect sound events from an audio recording.
Kong, Qiuqiang +3 more
core +2 more sources
Pengi: An Audio Language Model for Audio Tasks [PDF]
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-Supervised Learning and Zero-Shot Learning techniques. These approaches have led to the development of versatile models capable of tackling a wide array of tasks, while
Soham Deshmukh +3 more
semanticscholar +1 more source
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [PDF]
Generating talking head videos through a face image and a piece of speech audio still contains many challenges. i.e., unnatural head movement, distorted expression, and identity modification.
Wenxuan Zhang +7 more
semanticscholar +1 more source
Crowdsourcing Ecologically-Valid Dialogue Data for German
Despite their increasing success, user interactions with smart speech assistants (SAs) are still very limited compared to human-human dialogue. One way to make SA interactions more natural is to train the underlying natural language processing modules on
Yannick Frommherz, Alessandra Zarcone
doaj +1 more source
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [PDF]
Large-scale multimodal generative modeling has created milestones in text-to-image and text-to-video generation. Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the
Rongjie Huang +9 more
semanticscholar +1 more source
Pilot study on the influence of spatial resolution of human voice directivity on speech perception
A perceptual threshold related to spatial resolution of the human voice directivity was determined through a listening test of similarity (MUSHRA). Directivity data of an artificial talking head measured at high spatial resolution (spherical harmonics ...
Quélennec Aurian, Luizard Paul
doaj +1 more source
AudioGen: Textually Guided Audio Generation [PDF]
We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs.
F. Kreuk +8 more
semanticscholar +1 more source
Does Audio Deepfake Detection Generalize? [PDF]
Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these ...
N. Müller +4 more
semanticscholar +1 more source
19th International Technical Meeting of the Satellite Division of the Institute of Navigation; Forth Worth ...
Woo, DT +4 more
openaire +3 more sources
Time-frequency diffraction acoustic modeling of the Epidaurus ancient theatre
This work provides an in-depth investigation on the effect of sound diffraction in the acoustics of the ancient theatres, with reference to the theatre of Epidaurus.
Kaleris Konstantinos +3 more
doaj +1 more source

