Results 281 to 290 of about 2,183,275 (319)
Some of the next articles are maybe not open access.

Fast Timing-Conditioned Latent Audio Diffusion

International Conference on Machine Learning
Generating long-form 44.1kHz stereo audio from text prompts can be computationally demanding. Further, most previous works do not tackle that music and sound effects naturally vary in their duration.
Zach Evans   +4 more
semanticscholar   +1 more source

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model With Self-Generated Cross-Modal Alignment

IEEE Transactions on Audio, Speech, and Language Processing
We introduce DeSTA2.5-Audio, a general-purpose Large Audio Language Model (LALM) designed for robust auditory perception and instruction-following. Recent LALMs augment Large Language Models (LLMs) with auditory capabilities by training on large-scale ...
Ke-Han Lu   +27 more
semanticscholar   +1 more source

Lumina audio / Litora audio

2017
Rezension zu: Lumina audio. Von Ursula Blank-Sangmeister, Hubert Müller, Helmut Schlüter und Kurt Steinicke (ISBN 3-525-71045-3). Litora audio. Von Ursula Blank-Sangmeister und Hubert Müller (ISBN3-525- 7175-5). - Hör-CDs mit nach pronuntiatus restitutus gelesenen Lehrbuchtexten. Sprecher: Julia Hansen und Michael Jackenkroll.
openaire   +1 more source

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

International Conference on Machine Learning
Augmenting large language models (LLMs) to understand audio -- including non-speech sounds and non-verbal speech -- is critically important for diverse real-world applications of LLMs.
Zhifeng Kong   +5 more
semanticscholar   +1 more source

GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities

Conference on Empirical Methods in Natural Language Processing
Perceiving and understanding non-speech sounds and non-verbal speech is essential to making decisions that help us interact with our surroundings. In this paper, we propose GAMA, a novel General-purpose Large Audio-Language Model (LALM) with Advanced ...
Sreyan Ghosh   +8 more
semanticscholar   +1 more source

MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Computer Vision and Pattern Recognition
We propose to synthesize high-quality and synchronized audio, given video and optional text conditions, using a novel multimodal joint training framework (MMAudio). In contrast to single-modality training conditioned on (limited) video data only, MMAudio
H. Cheng   +5 more
semanticscholar   +1 more source

Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency

International Conference on Learning Representations
With the introduction of diffusion-based video generation techniques, audio-conditioned human video generation has recently achieved significant breakthroughs in both the naturalness of motion and the synthesis of portrait details.
Jianwen Jiang   +5 more
semanticscholar   +1 more source

Audio Set: An ontology and human-labeled dataset for audio events

IEEE International Conference on Acoustics, Speech, and Signal Processing, 2017
J. Gemmeke   +7 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy