HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [PDF]
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound ...
Wei-Ning Hsu +5 more
semanticscholar +1 more source
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [PDF]
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
Sanyuan Chen +16 more
semanticscholar +1 more source
Conformer: Convolution-augmented Transformer for Speech Recognition [PDF]
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Anmol Gulati +10 more
semanticscholar +1 more source
Scaling Speech Technology to 1, 000+ Languages [PDF]
Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 ...
Vineel Pratap +15 more
semanticscholar +1 more source
SUPERB: Speech processing Universal PERformance Benchmark [PDF]
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for ...
Shu-Wen Yang +19 more
semanticscholar +1 more source
Communication Challenges and Implementation of Telepractice for Children with Hearing Impairment during Lockdown- A Parental Perspective [PDF]
Introduction: The global Coronavirus Disease 2019 (COVID-19) outbreak has resulted in numerous difficulties and drawbacks in our daily life. Despite causing mortality, it has halted the therapeutic facilities because of the in-person interaction ...
MN Anusha +4 more
doaj +1 more source
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [PDF]
We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients).
Daniel S. Park +6 more
semanticscholar +1 more source
XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale [PDF]
This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of
Arun Babu +12 more
semanticscholar +1 more source
FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech [PDF]
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours ...
Alexis Conneau +8 more
semanticscholar +1 more source
Speech recognition with deep recurrent neural networks [PDF]
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is ...
Alex Graves +2 more
semanticscholar +1 more source

