Results 271 to 280 of about 2,183,275 (319)
Some of the next articles are maybe not open access.

Qwen2-Audio Technical Report

arXiv.org
We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
Yunfei Chu   +11 more
semanticscholar   +1 more source

Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models

Conference on Empirical Methods in Natural Language Processing
Recent advancements in multimodal reasoning have largely overlooked the audio modality. We introduce Audio-Reasoner, a large-scale audio language model for deep reasoning in audio tasks.
Zhifei Xie   +5 more
semanticscholar   +1 more source

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

arXiv.org
The field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits.
Mingwang Xu   +8 more
semanticscholar   +1 more source

Stable Audio Open

IEEE International Conference on Acoustics, Speech, and Signal Processing
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build ...
Zach Evans   +5 more
semanticscholar   +1 more source

MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark

International Conference on Learning Representations
The ability to comprehend audio--which includes speech, non-speech sounds, and music--is crucial for AI agents to interact effectively with the world. We present MMAU, a novel benchmark designed to evaluate multimodal audio understanding models on tasks ...
S. Sakshi   +8 more
semanticscholar   +1 more source

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

AAAI Conference on Artificial Intelligence
The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into ...
Zhiyuan Chen   +4 more
semanticscholar   +1 more source

MiMo-Audio: Audio Language Models are Few-Shot Learners

arXiv.org
Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions.
X. Zhang   +98 more
semanticscholar   +1 more source

Audio/visual reviews

Journal of Sex & Marital Therapy, 1979
Abstract With this issue, we introduce a slightly different format for reviews of some commercial films. Mommie Dearest is reviewed by Drs. Stevan Cressitt, a practicing psychiatrist in Lynn, Mass., John Garrison, a Psychologist at Harvard Medical School, and Jenny Phillips, a practicing nurse practitioner in Salem, Mass.
openaire   +1 more source

Audio encoder and audio decoder

The Journal of the Acoustical Society of America, 2010
An audio encoder for producing stereo signals based on multi-channel signals, wherein a downmix part (100) downmixes multi-channel signals, which are of greater-than-two channels, to two-channel stereo signals. A first encoding part (101) encodes a downmixed stereo signal to produce a first encoded signal.
openaire   +1 more source

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Annual Meeting of the Association for Computational Linguistics
Recently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field ...
Qian Yang   +10 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy