Audio signal processing - Open Access .click

Results 21 to 30 of about 254,173 (190)

Deep convolutional neural networks for double compressed AMR audio detection

IET Signal Processing, 2021
Detection of double compressed (DC) adaptive multi‐rate (AMR) audio recordings is a challenging audio forensic problem and has received great attention in recent years. Here, the authors propose to use convolutional neural networks (CNN) for DC AMR audio
Aykut Büker, Cemal Hanilçi
doaj +1 more source

Audio-visual speech recognition with background music using single-channel source separation [PDF]

, 2012
In this paper, we consider audio-visual speech recognition with background music. The proposed algorithm is an integration of audio-visual speech recognition and single channel source separation (SCSS). We apply the proposed algorithm to recognize spoken
Erdogan, Hakan +4 more
core +1 more source

GestureVLAD: Combining Unsupervised Features Representation and Spatio-Temporal Aggregation for Doppler-Radar Gesture Recognition

IEEE Access, 2019
In this paper we propose a novel framework to process Doppler-radar signals for hand gesture recognition. Doppler-radar sensors provide many advantages over other emerging sensing modalities, including low development costs and high sensitivity to ...
Abel Diaz Berenguer +5 more
doaj +1 more source

An Audio-Visual Separation Model Integrating Dual-Channel Attention Mechanism

IEEE Access, 2023
Sound source separation is the separation of targeted sounds from a noisy environment, which plays an important role in signal processing and has been studied extensively.
Yutao Zhang, Kaixing Wu, Mengfan Zhao
doaj +1 more source

APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING [PDF]

Научно-технический вестник информационных технологий, механики и оптики, 2015
Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods.
A. L. Oleinik
doaj +1 more source

Automatic annotation of tennis games: An integration of audio, vision, and learning [PDF]

, 1999
Fully automatic annotation of tennis game using broadcast video is a task with a great potential but with enormous challenges. In this paper we describe our approach to this task, which integrates computer vision, machine listening, and machine learning.
Fei Yan +27 more
core +1 more source

Limitations and Performance Analysis of Spherical Sector Harmonics for Sound Field Processing

Applied Sciences
Developing spherical sector harmonics (SSHs) benefits sound field decomposition and analysis over spherical sector regions. Although SSHs demonstrate potential in the field of spatial audio, a comprehensive investigation into their properties and ...
Hanwen Bi +4 more
doaj +1 more source

Speech enhancement methods based on binaural cue coding

EURASIP Journal on Audio, Speech, and Music Processing, 2019
According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively.
Xianyun Wang, Changchun Bao
doaj +1 more source

Asynchronous spiking neurons, the natural key to exploit temporal sparsity [PDF]

, 2019
Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic ...
Cavalcante Holanda, Priscila +10 more
core +1 more source

Room Impulse Response Dataset of a Recording Studio with Variable Wall Paneling Measured Using a 32-Channel Spherical Microphone Array and a B-Format Microphone Array

Applied Sciences
This paper introduces RSoANU, a dataset of real multichannel room impulse responses (RIRs) obtained in a recording studio. Compared to the current publicly available datasets, RSoANU distinguishes itself by featuring RIRs captured using both a 32-channel
Grace Chesworth, Amy Bastine, Thushara Abhayapala +2 more
doaj +1 more source

speech enhancement
linear prediction
virtual reality