A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability
Identifying beat positions in music recordings, a central task in Music Information Retrieval (MIR), is commonly referred to as beat tracking. Typically, this involves computing an activation function to reveal frame-wise beat likelihood and then ...
Peter Meier +2 more
doaj +1 more source
MPEG-1 bitstreams processing for audio content analysis [PDF]
In this paper, we present the MPEG-1 Audio bitstreams processing work which our research group is involved in. This work is primarily based on the processing of the encoded bitstream, and the extraction of useful audio features for the purposes of ...
Duffner, Orla +4 more
core +1 more source
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization [PDF]
Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life.
Navonil Majumder +5 more
semanticscholar +1 more source
Continual Learning for Multimodal Data Fusion of a Soft Gripper
Models trained on a single data modality often struggle to generalize when exposed to a different modality. This work introduces a continual learning algorithm capable of incrementally learning different data modalities by leveraging both class‐incremental and domain‐incremental learning scenarios in an artificial environment where labeled data is ...
Nilay Kushawaha, Egidio Falotico
wiley +1 more source
20th Anniversary of the Polish Section of the Audio Engineering Society
In June 2011, Polish Section of the Audio Engineering Society will celebrate its 20th anniversary. On this occasion, the society officers, ending their second, two-year term, present a short summary of the Section’s activity during the past four years.
Zbigniew KULKA +3 more
doaj
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners [PDF]
Video and audio content creation serves as the core technique for the movie industry and professional users. Re-cently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to ...
Yazhou Xing +4 more
semanticscholar +1 more source
Auditory–Tactile Congruence for Synthesis of Adaptive Pain Expressions in RoboPatients
In this work, we explore auditory–tactile congruence for synthesizing adaptive vocal pain expressions in robopatients. Using a robopatient platform that integrates vocal pain sounds with palpation forces, we conducted 7680 trials across 20 participants.
Saitarun Nadipineni +4 more
wiley +1 more source
A Review of Time-Scale Modification of Music Signals
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music
Jonathan Driedger, Meinard Müller
doaj +1 more source
Postoperative audiovestibular assessment of obstructive sleep apnea patients
Background In obstructive sleep apnea syndrome, the impact of hypoxia on different body systems is of utmost importance. Brainstem is greatly sensitive to the effects of hypoxia including auditory and vestibular nuclei. Our aim in the current study is to
Hesham Saad Kouzo +4 more
doaj +1 more source
Multimodal Human–Robot Interaction Using Human Pose Estimation and Local Large Language Models
A multimodal human–robot interaction framework integrates human pose estimation (HPE) and a large language model (LLM) for gesture‐ and voice‐based robot control. Speech‐to‐text (STT) enables voice command interpretation, while a safety‐aware arbitration mechanism prioritizes gesture input for rapid intervention.
Nasiru Aboki +2 more
wiley +1 more source

