Results 271 to 280 of about 2,183,275 (319)
Some of the next articles are maybe not open access.
arXiv.org
We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
Yunfei Chu +11 more
semanticscholar +1 more source
We introduce the latest progress of Qwen-Audio, a large-scale audio-language model called Qwen2-Audio, which is capable of accepting various audio signal inputs and performing audio analysis or direct textual responses with regard to speech instructions.
Yunfei Chu +11 more
semanticscholar +1 more source
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Conference on Empirical Methods in Natural Language ProcessingRecent advancements in multimodal reasoning have largely overlooked the audio modality. We introduce Audio-Reasoner, a large-scale audio language model for deep reasoning in audio tasks.
Zhifei Xie +5 more
semanticscholar +1 more source
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
arXiv.orgThe field of portrait image animation, driven by speech audio input, has experienced significant advancements in the generation of realistic and dynamic portraits.
Mingwang Xu +8 more
semanticscholar +1 more source
IEEE International Conference on Acoustics, Speech, and Signal Processing
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build ...
Zach Evans +5 more
semanticscholar +1 more source
Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build ...
Zach Evans +5 more
semanticscholar +1 more source
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
International Conference on Learning RepresentationsThe ability to comprehend audio--which includes speech, non-speech sounds, and music--is crucial for AI agents to interact effectively with the world. We present MMAU, a novel benchmark designed to evaluate multimodal audio understanding models on tasks ...
S. Sakshi +8 more
semanticscholar +1 more source
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions
AAAI Conference on Artificial IntelligenceThe area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into ...
Zhiyuan Chen +4 more
semanticscholar +1 more source
MiMo-Audio: Audio Language Models are Few-Shot Learners
arXiv.orgExisting audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few examples or simple instructions.
X. Zhang +98 more
semanticscholar +1 more source
Journal of Sex & Marital Therapy, 1979
Abstract With this issue, we introduce a slightly different format for reviews of some commercial films. Mommie Dearest is reviewed by Drs. Stevan Cressitt, a practicing psychiatrist in Lynn, Mass., John Garrison, a Psychologist at Harvard Medical School, and Jenny Phillips, a practicing nurse practitioner in Salem, Mass.
openaire +1 more source
Abstract With this issue, we introduce a slightly different format for reviews of some commercial films. Mommie Dearest is reviewed by Drs. Stevan Cressitt, a practicing psychiatrist in Lynn, Mass., John Garrison, a Psychologist at Harvard Medical School, and Jenny Phillips, a practicing nurse practitioner in Salem, Mass.
openaire +1 more source
Audio encoder and audio decoder
The Journal of the Acoustical Society of America, 2010An audio encoder for producing stereo signals based on multi-channel signals, wherein a downmix part (100) downmixes multi-channel signals, which are of greater-than-two channels, to two-channel stereo signals. A first encoding part (101) encodes a downmixed stereo signal to produce a first encoded signal.
openaire +1 more source
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Annual Meeting of the Association for Computational LinguisticsRecently, instruction-following audio-language models have received broad attention for human-audio interaction. However, the absence of benchmarks capable of evaluating audio-centric interaction capabilities has impeded advancements in this field ...
Qian Yang +10 more
semanticscholar +1 more source

