Results 261 to 270 of about 2,183,275 (319)
Some of the next articles are maybe not open access.

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

arXiv.org, 2023
Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field.
Yunfei Chu   +7 more
semanticscholar   +1 more source

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

IEEE/ACM Transactions on Audio Speech and Language Processing, 2023
The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years, yet the limited size of existing audio-language datasets poses challenges for researchers due to the costly and time-consuming collection process.
Xinhao Mei   +8 more
semanticscholar   +1 more source

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

arXiv.org
We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music.
Arushi Goel   +10 more
semanticscholar   +1 more source

Kimi-Audio Technical Report

arXiv.org
We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference ...
KimiTeam   +39 more
semanticscholar   +1 more source

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

arXiv.org
In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.
Zesen Cheng   +10 more
semanticscholar   +1 more source

Audio authenticity: Duplicated audio segment detection in waveform audio file

Journal of Shanghai Jiaotong University (Science), 2014
Waveform audio (WAV) file is a widely used file format of uncompressed audio. With the rapid development of digital media technology, one can easily insert duplicated segments with powerful audio editing software, e.g. inserting a segment of audio with negative meaning into the existing audio file.
Ji-nian Xiao   +5 more
openaire   +1 more source

Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities

International Conference on Machine Learning
Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio ...
Sreyan Ghosh   +9 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy