Results 91 to 100 of about 2,183,275 (319)

A Real-Time Beat Tracking System with Zero Latency and Enhanced Controllability

open access: yesTransactions of the International Society for Music Information Retrieval
Identifying beat positions in music recordings, a central task in Music Information Retrieval (MIR), is commonly referred to as beat tracking. Typically, this involves computing an activation function to reveal frame-wise beat likelihood and then ...
Peter Meier   +2 more
doaj   +1 more source

MPEG-1 bitstreams processing for audio content analysis [PDF]

open access: yes, 2002
In this paper, we present the MPEG-1 Audio bitstreams processing work which our research group is involved in. This work is primarily based on the processing of the encoded bitstream, and the extraction of useful audio features for the purposes of ...
Duffner, Orla   +4 more
core   +1 more source

Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization [PDF]

open access: yesACM Multimedia
Generative multimodal content is increasingly prevalent in much of the content creation arena, as it has the potential to allow artists and media personnel to create pre-production mockups by quickly bringing their ideas to life.
Navonil Majumder   +5 more
semanticscholar   +1 more source

Continual Learning for Multimodal Data Fusion of a Soft Gripper

open access: yesAdvanced Robotics Research, EarlyView.
Models trained on a single data modality often struggle to generalize when exposed to a different modality. This work introduces a continual learning algorithm capable of incrementally learning different data modalities by leveraging both class‐incremental and domain‐incremental learning scenarios in an artificial environment where labeled data is ...
Nilay Kushawaha, Egidio Falotico
wiley   +1 more source

20th Anniversary of the Polish Section of the Audio Engineering Society

open access: yesArchives of Acoustics, 2013
In June 2011, Polish Section of the Audio Engineering Society will celebrate its 20th anniversary. On this occasion, the society officers, ending their second, two-year term, present a short summary of the Section’s activity during the past four years.
Zbigniew KULKA   +3 more
doaj  

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners [PDF]

open access: yesComputer Vision and Pattern Recognition
Video and audio content creation serves as the core technique for the movie industry and professional users. Re-cently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to ...
Yazhou Xing   +4 more
semanticscholar   +1 more source

Auditory–Tactile Congruence for Synthesis of Adaptive Pain Expressions in RoboPatients

open access: yesAdvanced Robotics Research, EarlyView.
In this work, we explore auditory–tactile congruence for synthesizing adaptive vocal pain expressions in robopatients. Using a robopatient platform that integrates vocal pain sounds with palpation forces, we conducted 7680 trials across 20 participants.
Saitarun Nadipineni   +4 more
wiley   +1 more source

A Review of Time-Scale Modification of Music Signals

open access: yesApplied Sciences, 2016
Time-scale modification (TSM) is the task of speeding up or slowing down an audio signal’s playback speed without changing its pitch. In digital music production, TSM has become an indispensable tool, which is nowadays integrated in a wide range of music
Jonathan Driedger, Meinard Müller
doaj   +1 more source

Postoperative audiovestibular assessment of obstructive sleep apnea patients

open access: yesThe Egyptian Journal of Otolaryngology, 2020
Background In obstructive sleep apnea syndrome, the impact of hypoxia on different body systems is of utmost importance. Brainstem is greatly sensitive to the effects of hypoxia including auditory and vestibular nuclei. Our aim in the current study is to
Hesham Saad Kouzo   +4 more
doaj   +1 more source

Multimodal Human–Robot Interaction Using Human Pose Estimation and Local Large Language Models

open access: yesAdvanced Robotics Research, EarlyView.
A multimodal human–robot interaction framework integrates human pose estimation (HPE) and a large language model (LLM) for gesture‐ and voice‐based robot control. Speech‐to‐text (STT) enables voice command interpretation, while a safety‐aware arbitration mechanism prioritizes gesture input for rapid intervention.
Nasiru Aboki   +2 more
wiley   +1 more source

Home - About - Disclaimer - Privacy