Speech enhancement - Open Access .click

Results 21 to 30 of about 7,769,076 (229)

Exploring Multi-Stage GAN with Self-Attention for Speech Enhancement

Applied Sciences, 2023
Multi-stage or multi-generator generative adversarial networks (GANs) have recently been demonstrated to be effective for speech enhancement. The existing multi-generator GANs for speech enhancement only use convolutional layers for synthesising clean ...
Bismark Kweku Asiedu Asante +2 more
doaj +1 more source

How Bad Are Artifacts?: Analyzing the Impact of Speech Enhancement Errors on ASR [PDF]

Interspeech, 2022
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with single-channel speech enhancement (SE). In this paper, we investigate the causes of ASR performance degradation by decomposing the SE errors using ...
Kazuma Iwamoto +6 more
semanticscholar +1 more source

Enhancing Speech Privacy with Slicing

Interspeech 2022, 2022
Privacy preservation calls for speech anonymization methods which hide the speaker's identity while minimizing the impact on downstream tasks such as automatic speech recognition (ASR) training or decoding. In the recent VoicePrivacy 2020 Challenge, several anonymization methods have been proposed to transform speech utterances in a way that preserves ...
Maouche, Mohamed +5 more
openaire +4 more sources

Insights Into Deep Non-Linear Filters for Improved Multi-Channel Speech Enhancement [PDF]

IEEE/ACM Transactions on Audio Speech and Language Processing, 2022
The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing.
Kristina Tesch, Timo Gerkmann
semanticscholar +1 more source

Deepfilternet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio [PDF]

International Workshop on Acoustic Signal Enhancement, 2022
Deep learning-based speech enhancement has seen huge improvements and recently also expanded to full band audio (48 kHz). However, many approaches have a rather high computational complexity and require big temporal buffers for real time usage e.g.
Hendrik Schröter +3 more
semanticscholar +1 more source

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation [PDF]

Interspeech, 2022
This work presents our end-to-end (E2E) automatic speech recognition (ASR) model targetting at robust speech recognition, called Integraded speech Recognition with enhanced speech Input for Self-supervised learning representation (IRIS).
Xuankai Chang +3 more
semanticscholar +1 more source

Cued Speech Enhances Speech-in-Noise Perception [PDF]

The Journal of Deaf Studies and Deaf Education, 2019
Speech perception in noise remains challenging for Deaf/Hard of Hearing people (D/HH), even fitted with hearing aids or cochlear implants. The perception of sentences in noise by 20 implanted or aided D/HH subjects mastering Cued Speech (CS), a system of hand gestures complementing lip movements, was compared with the perception of 15 typically hearing
Bayard, Clémence +5 more
openaire +6 more sources

Speaker Re-identification with Speaker Dependent Speech Enhancement [PDF]

, 2020
While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments.
Hain, Thomas, Huang, Qiang, Shi, Yanpei
core +2 more sources

HIFI++: A Unified Framework for Bandwidth Extension and Speech Enhancement [PDF]

IEEE International Conference on Acoustics, Speech, and Signal Processing, 2022
Generative adversarial networks have recently demonstrated outstanding performance in neural vocoding outperforming best autoregressive and flow-based models.
Pavel Andreev, Aibek Alanov, Oleg Ivanov, D. Vetrov +3 more
semanticscholar +1 more source

STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency [PDF]

IEEE/ACM Transactions on Audio Speech and Language Processing, 2022
Deep learning based speech enhancement in the short-time Fourier transform (STFT) domain typically uses a large window length such as 32 ms. A larger window can lead to higher frequency resolution and potentially better enhancement.
Zhong-Qiu Wang +3 more
semanticscholar +1 more source

speech recognition
engineering
artificial intelligence

noise reduction
image mathematics
noise video

mathematics
telecommunications
physics