Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement
Multi-stage or multi-generator generative adversarial networks (GANs) have recently been demonstrated to be effective for speech enhancement. The existing multi-generator GANs for speech enhancement only use convolutional layers for synthesising clean ...
Bismark Kweku Asiedu Asante+2 more
doaj +1 more source
How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR [PDF]
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using ...
Kazuma Iwamoto+6 more
semanticscholar +1 more source
Enhancing Speech Privacy with Slicing
Privacy preservation calls for speech anonymization methods which hide the speaker's identity while minimizing the impact on downstream tasks such as automatic speech recognition (ASR) training or decoding. In the recent VoicePrivacy 2020 Challenge, several anonymization methods have been proposed to transform speech utterances in a way that preserves ...
Maouche, Mohamed+5 more
openaire +4 more sources
Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement [PDF]
The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing.
Kristina Tesch, Timo Gerkmann
semanticscholar +1 more source
Deepfilternet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio [PDF]
Deep learning-based speech enhancement has seen huge improvements and recently also expanded to full band audio (48 kHz). However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e.g.
Hendrik Schröter+3 more
semanticscholar +1 more source
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation [PDF]
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).
Xuankai Chang+3 more
semanticscholar +1 more source
Cued Speech Enhances Speech-in-Noise Perception [PDF]
Speech perception in noise remains challenging for Deaf/Hard of Hearing people (D/HH), even fitted with hearing aids or cochlear implants. The perception of sentences in noise by 20 implanted or aided D/HH subjects mastering Cued Speech (CS), a system of hand gestures complementing lip movements, was compared with the perception of 15 typically hearing
Bayard, Clémence+5 more
openaire +6 more sources
Speaker Re-identification with Speaker Dependent Speech Enhancement [PDF]
While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments.
Hain, Thomas, Huang, Qiang, Shi, Yanpei
core +2 more sources
HIFI++: A Unified Framework for Bandwidth Extension and Speech Enhancement [PDF]
Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models.
Pavel Andreev+3 more
semanticscholar +1 more source
STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency [PDF]
Deep learning based speech enhancement in the short-time Fourier transform (STFT) domain typically uses a large window length such as 32 ms. A larger window can lead to higher frequency resolution and potentially better enhancement.
Zhong-Qiu Wang+3 more
semanticscholar +1 more source