Results 21 to 30 of about 3,404,666 (322)

WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit [PDF]

open access: yesInterspeech, 2021
In this paper, we propose an open source, production first, and production ready speech recognition toolkit called WeNet in which a new two-pass approach is implemented to unify streaming and non-streaming end-to-end (E2E) speech recognition in a single ...
Zhuoyuan Yao   +9 more
semanticscholar   +1 more source

Recent Advances in End-to-End Automatic Speech Recognition [PDF]

open access: yesAPSIPA Transactions on Signal and Information Processing, 2021
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR).
Jinyu Li
semanticscholar   +1 more source

Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings [PDF]

open access: yesInterspeech, 2021
Emotion recognition datasets are relatively small, making the use of the more sophisticated deep learning approaches challenging. In this work, we propose a transfer learning method for speech emotion recognition where features extracted from pre-trained
Leonardo Pepino   +2 more
semanticscholar   +1 more source

Recent Progress in the CUHK Dysarthric Speech Recognition System [PDF]

open access: yesIEEE/ACM Transactions on Audio Speech and Language Processing, 2022
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date.
Shansong Liu   +7 more
semanticscholar   +1 more source

Unsupervised Cross-lingual Representation Learning for Speech Recognition [PDF]

open access: yesInterspeech, 2020
This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
Alexis Conneau   +4 more
semanticscholar   +1 more source

WENETSPEECH: A 10000+ Hours Multi-Domain Mandarin Corpus for Speech Recognition [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2021
In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.
Binbin Zhang   +11 more
semanticscholar   +1 more source

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [PDF]

open access: yesInterspeech, 2019
We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients).
Daniel S. Park   +6 more
semanticscholar   +1 more source

Speech recognition with deep recurrent neural networks [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2013
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is ...
Alex Graves   +2 more
semanticscholar   +1 more source

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2020
In this paper we present an end-to-end speech recognition model with Transformer encoders that can be used in a streaming speech recognition system. Transformer computation blocks based on self-attention are used to encode both audio and label sequences ...
Qian Zhang   +6 more
semanticscholar   +1 more source

Intermediate Loss Regularization for CTC-Based Speech Recognition [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2021
We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification (CTC) objective.
Jaesong Lee, Shinji Watanabe
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy