Results 31 to 40 of about 2,183,275 (319)

A joint separation-classification model for sound event detection of weakly labelled data [PDF]

open access: yes, 2018
Source separation (SS) aims to separate individual sources from an audio recording. Sound event detection (SED) aims to detect sound events from an audio recording.
Kong, Qiuqiang   +3 more
core   +2 more sources

Pengi: An Audio Language Model for Audio Tasks [PDF]

open access: yesNeural Information Processing Systems, 2023
In the domain of audio processing, Transfer Learning has facilitated the rise of Self-Supervised Learning and Zero-Shot Learning techniques. These approaches have led to the development of versatile models capable of tackling a wide array of tasks, while
Soham Deshmukh   +3 more
semanticscholar   +1 more source

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation [PDF]

open access: yesComputer Vision and Pattern Recognition, 2022
Generating talking head videos through a face image and a piece of speech audio still contains many challenges. i.e., unnatural head movement, distorted expression, and identity modification.
Wenxuan Zhang   +7 more
semanticscholar   +1 more source

Crowdsourcing Ecologically-Valid Dialogue Data for German

open access: yesFrontiers in Computer Science, 2021
Despite their increasing success, user interactions with smart speech assistants (SAs) are still very limited compared to human-human dialogue. One way to make SA interactions more natural is to train the underlying natural language processing modules on
Yannick Frommherz, Alessandra Zarcone
doaj   +1 more source

Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models [PDF]

open access: yesInternational Conference on Machine Learning, 2023
Large-scale multimodal generative modeling has created milestones in text-to-image and text-to-video generation. Its application to audio still lags behind for two main reasons: the lack of large-scale datasets with high-quality text-audio pairs, and the
Rongjie Huang   +9 more
semanticscholar   +1 more source

Pilot study on the influence of spatial resolution of human voice directivity on speech perception

open access: yesActa Acustica, 2022
A perceptual threshold related to spatial resolution of the human voice directivity was determined through a listening test of similarity (MUSHRA). Directivity data of an artificial talking head measured at high spatial resolution (spherical harmonics ...
Quélennec Aurian, Luizard Paul
doaj   +1 more source

AudioGen: Textually Guided Audio Generation [PDF]

open access: yesInternational Conference on Learning Representations, 2022
We tackle the problem of generating audio samples conditioned on descriptive text captions. In this work, we propose AaudioGen, an auto-regressive generative model that generates audio samples conditioned on text inputs.
F. Kreuk   +8 more
semanticscholar   +1 more source

Does Audio Deepfake Detection Generalize? [PDF]

open access: yesInterspeech, 2022
Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these ...
N. Müller   +4 more
semanticscholar   +1 more source

Audio Nomad

open access: yes, 2006
19th International Technical Meeting of the Satellite Division of the Institute of Navigation; Forth Worth ...
Woo, DT   +4 more
openaire   +3 more sources

Time-frequency diffraction acoustic modeling of the Epidaurus ancient theatre

open access: yesActa Acustica, 2023
This work provides an in-depth investigation on the effect of sound diffraction in the acoustics of the ancient theatres, with reference to the theatre of Epidaurus.
Kaleris Konstantinos   +3 more
doaj   +1 more source

Home - About - Disclaimer - Privacy