Results 261 to 270 of about 2,183,275 (319)
Some of the next articles are maybe not open access.
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
arXiv.org, 2023Recently, instruction-following audio-language models have received broad attention for audio interaction with humans. However, the absence of pre-trained audio models capable of handling diverse audio types and tasks has hindered progress in this field.
Yunfei Chu +7 more
semanticscholar +1 more source
IEEE/ACM Transactions on Audio Speech and Language Processing, 2023
The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years, yet the limited size of existing audio-language datasets poses challenges for researchers due to the costly and time-consuming collection process.
Xinhao Mei +8 more
semanticscholar +1 more source
The advancement of audio-language (AL) multimodal learning tasks has been significant in recent years, yet the limited size of existing audio-language datasets poses challenges for researchers due to the costly and time-consuming collection process.
Xinhao Mei +8 more
semanticscholar +1 more source
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
arXiv.orgWe present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music.
Arushi Goel +10 more
semanticscholar +1 more source
arXiv.org
We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference ...
KimiTeam +39 more
semanticscholar +1 more source
We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference ...
KimiTeam +39 more
semanticscholar +1 more source
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
arXiv.orgIn this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks.
Zesen Cheng +10 more
semanticscholar +1 more source
Audio authenticity: Duplicated audio segment detection in waveform audio file
Journal of Shanghai Jiaotong University (Science), 2014Waveform audio (WAV) file is a widely used file format of uncompressed audio. With the rapid development of digital media technology, one can easily insert duplicated segments with powerful audio editing software, e.g. inserting a segment of audio with negative meaning into the existing audio file.
Ji-nian Xiao +5 more
openaire +1 more source
International Conference on Machine Learning
Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio ...
Sreyan Ghosh +9 more
semanticscholar +1 more source
Understanding and reasoning over non-speech sounds and music are crucial for both humans and AI agents to interact effectively with their environments. In this paper, we introduce Audio Flamingo 2 (AF2), an Audio-Language Model (ALM) with advanced audio ...
Sreyan Ghosh +9 more
semanticscholar +1 more source

