Results 21 to 30 of about 254,173 (190)
Deep convolutional neural networks for double compressed AMR audio detection
Detection of double compressed (DC) adaptive multi‐rate (AMR) audio recordings is a challenging audio forensic problem and has received great attention in recent years. Here, the authors propose to use convolutional neural networks (CNN) for DC AMR audio
Aykut Büker, Cemal Hanilçi
doaj +1 more source
Audio-visual speech recognition with background music using single-channel source separation [PDF]
In this paper, we consider audio-visual speech recognition with background music. The proposed algorithm is an integration of audio-visual speech recognition and single channel source separation (SCSS). We apply the proposed algorithm to recognize spoken
Erdogan, Hakan +4 more
core +1 more source
In this paper we propose a novel framework to process Doppler-radar signals for hand gesture recognition. Doppler-radar sensors provide many advantages over other emerging sensing modalities, including low development costs and high sensitivity to ...
Abel Diaz Berenguer +5 more
doaj +1 more source
An Audio-Visual Separation Model Integrating Dual-Channel Attention Mechanism
Sound source separation is the separation of targeted sounds from a noisy environment, which plays an important role in signal processing and has been studied extensively.
Yutao Zhang, Kaixing Wu, Mengfan Zhao
doaj +1 more source
APPLICATION OF PARTIAL LEAST SQUARES REGRESSION FOR AUDIO-VISUAL SPEECH PROCESSING AND MODELING [PDF]
Subject of Research. The paper deals with the problem of lip region image reconstruction from speech signal by means of Partial Least Squares regression. Such problems arise in connection with development of audio-visual speech processing methods.
A. L. Oleinik
doaj +1 more source
Automatic annotation of tennis games: An integration of audio, vision, and learning [PDF]
Fully automatic annotation of tennis game using broadcast video is a task with a great potential but with enormous challenges. In this paper we describe our approach to this task, which integrates computer vision, machine listening, and machine learning.
Fei Yan +27 more
core +1 more source
Limitations and Performance Analysis of Spherical Sector Harmonics for Sound Field Processing
Developing spherical sector harmonics (SSHs) benefits sound field decomposition and analysis over spherical sector regions. Although SSHs demonstrate potential in the field of spatial audio, a comprehensive investigation into their properties and ...
Hanwen Bi +4 more
doaj +1 more source
Speech enhancement methods based on binaural cue coding
According to the encoding and decoding mechanism of binaural cue coding (BCC), in this paper, the speech and noise are considered as left channel signal and right channel signal of the BCC framework, respectively.
Xianyun Wang, Changchun Bao
doaj +1 more source
Asynchronous spiking neurons, the natural key to exploit temporal sparsity [PDF]
Inference of Deep Neural Networks for stream signal (Video/Audio) processing in edge devices is still challenging. Unlike the most state of the art inference engines which are efficient for static signals, our brain is optimized for real-time dynamic ...
Cavalcante Holanda, Priscila +10 more
core +1 more source
This paper introduces RSoANU, a dataset of real multichannel room impulse responses (RIRs) obtained in a recording studio. Compared to the current publicly available datasets, RSoANU distinguishes itself by featuring RIRs captured using both a 32-channel
Grace Chesworth +2 more
doaj +1 more source

