Results 281 to 290 of about 2,183,275 (319)
Some of the next articles are maybe not open access.
Fast Timing-Conditioned Latent Audio Diffusion
International Conference on Machine LearningGenerating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration.
Zach Evans +4 more
semanticscholar +1 more source
IEEE Transactions on Audio, Speech, and Language Processing
We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following. Recent LALMs augment Large Language Models (LLMs) with auditory capabilities by training on large-scale ...
Ke-Han Lu +27 more
semanticscholar +1 more source
We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following. Recent LALMs augment Large Language Models (LLMs) with auditory capabilities by training on large-scale ...
Ke-Han Lu +27 more
semanticscholar +1 more source
2017
Rezension zu: Lumina audio. Von Ursula Blank-Sangmeister, Hubert Müller, Helmut Schlüter und Kurt Steinicke (ISBN 3-525-71045-3). Litora audio. Von Ursula Blank-Sangmeister und Hubert Müller (ISBN3-525- 7175-5). - Hör-CDs mit nach pronuntiatus restitutus gelesenen Lehrbuchtexten. Sprecher: Julia Hansen und Michael Jackenkroll.
openaire +1 more source
Rezension zu: Lumina audio. Von Ursula Blank-Sangmeister, Hubert Müller, Helmut Schlüter und Kurt Steinicke (ISBN 3-525-71045-3). Litora audio. Von Ursula Blank-Sangmeister und Hubert Müller (ISBN3-525- 7175-5). - Hör-CDs mit nach pronuntiatus restitutus gelesenen Lehrbuchtexten. Sprecher: Julia Hansen und Michael Jackenkroll.
openaire +1 more source
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
International Conference on Machine LearningAugmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.
Zhifeng Kong +5 more
semanticscholar +1 more source
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Conference on Empirical Methods in Natural Language ProcessingPerceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced ...
Sreyan Ghosh +8 more
semanticscholar +1 more source
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Computer Vision and Pattern RecognitionWe propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework (MMAudio). In contrast to single-modality training conditioned on (limited) video data only, MMAudio
H. Cheng +5 more
semanticscholar +1 more source
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
International Conference on Learning RepresentationsWith the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details.
Jianwen Jiang +5 more
semanticscholar +1 more source
Audio Set: An ontology and human-labeled dataset for audio events
IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017J. Gemmeke +7 more
semanticscholar +1 more source

