Audio - Open Access .click

Results 91 to 100 of about 2,183,275 (319)

A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability

Transactions of the International Society for Music Information Retrieval
Identifying beat positions in music recordings, a central task in Music Information Retrieval (MIR), is commonly referred to as beat tracking. Typically, this involves computing an activation function to reveal frame-wise beat likelihood and then ...
Peter Meier, Ching-Yu Chiu, Meinard Müller +2 more
doaj +1 more source

MPEG-1 bitstreams processing for audio content analysis [PDF]

, 2002
In this paper, we present the MPEG-1 Audio bitstreams processing work which our research group is involved in. This work is primarily based on the processing of the encoded bitstream, and the extraction of useful audio features for the purposes of ...
Duffner, Orla +4 more
core +1 more source

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization [PDF]

ACM Multimedia
Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life.
Navonil Majumder +5 more
semanticscholar +1 more source

Continual Learning for Multimodal Data Fusion of a Soft Gripper

Advanced Robotics Research, EarlyView.
Models trained on a single data modality often struggle to generalize when exposed to a different modality. This work introduces a continual learning algorithm capable of incrementally learning different data modalities by leveraging both class‐incremental and domain‐incremental learning scenarios in an artificial environment where labeled data is ...
Nilay Kushawaha, Egidio Falotico
wiley +1 more source

20th Anniversary of the Polish Section of the Audio Engineering Society

Archives of Acoustics, 2013
In June 2011, Polish Section of the Audio Engineering Society will celebrate its 20th anniversary. On this occasion, the society officers, ending their second, two-year term, present a short summary of the Section’s activity during the past four years.
Zbigniew KULKA +3 more
doaj

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners [PDF]

Computer Vision and Pattern Recognition
Video and audio content creation serves as the core technique for the movie industry and professional users. Re-cently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to ...
Yazhou Xing +4 more
semanticscholar +1 more source

Auditory–Tactile Congruence for Synthesis of Adaptive Pain Expressions in RoboPatients

Advanced Robotics Research, EarlyView.
In this work, we explore auditory–tactile congruence for synthesizing adaptive vocal pain expressions in robopatients. Using a robopatient platform that integrates vocal pain sounds with palpation forces, we conducted 7680 trials across 20 participants.
Saitarun Nadipineni +4 more
wiley +1 more source

A Review of Time-Scale Modification of Music Signals

Applied Sciences, 2016
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music
Jonathan Driedger, Meinard Müller
doaj +1 more source

Postoperative audiovestibular assessment of obstructive sleep apnea patients

The Egyptian Journal of Otolaryngology, 2020
Background In obstructive sleep apnea syndrome, the impact of hypoxia on different body systems is of utmost importance. Brainstem is greatly sensitive to the effects of hypoxia including auditory and vestibular nuclei. Our aim in the current study is to
Hesham Saad Kouzo +4 more
doaj +1 more source

Multimodal Human–Robot Interaction Using Human Pose Estimation and Local Large Language Models

Advanced Robotics Research, EarlyView.
A multimodal human–robot interaction framework integrates human pose estimation (HPE) and a large language model (LLM) for gesture‐ and voice‐based robot control. Speech‐to‐text (STT) enables voice command interpretation, while a safety‐aware arbitration mechanism prioritizes gesture input for rapid intervention.
Nasiru Aboki, Ilche Georgievski, Marco Aiello +2 more
wiley +1 more source

dataset
music processing
multi-modal