Results 11 to 20 of about 1,469,086 (346)

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [PDF]

open access: yesIEEE/ACM Transactions on Audio Speech and Language Processing, 2021
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound ...
Wei-Ning Hsu   +5 more
semanticscholar   +1 more source

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [PDF]

open access: yesIEEE Journal on Selected Topics in Signal Processing, 2021
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
Sanyuan Chen   +16 more
semanticscholar   +1 more source

Conformer: Convolution-augmented Transformer for Speech Recognition [PDF]

open access: yesInterspeech, 2020
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Anmol Gulati   +10 more
semanticscholar   +1 more source

Scaling Speech Technology to 1, 000+ Languages [PDF]

open access: yesJournal of machine learning research, 2023
Expanding the language coverage of speech technology has the potential to improve access to information for many more people. However, current speech technology is restricted to about one hundred languages which is a small fraction of the over 7,000 ...
Vineel Pratap   +15 more
semanticscholar   +1 more source

SUPERB: Speech processing Universal PERformance Benchmark [PDF]

open access: yesInterspeech, 2021
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for ...
Shu-Wen Yang   +19 more
semanticscholar   +1 more source

Communication Challenges and Implementation of Telepractice for Children with Hearing Impairment during Lockdown- A Parental Perspective [PDF]

open access: yesJournal of Clinical and Diagnostic Research, 2021
Introduction: The global Coronavirus Disease 2019 (COVID-19) outbreak has resulted in numerous difficulties and drawbacks in our daily life. Despite causing mortality, it has halted the therapeutic facilities because of the in-person interaction ...
MN Anusha   +4 more
doaj   +1 more source

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition [PDF]

open access: yesInterspeech, 2019
We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients).
Daniel S. Park   +6 more
semanticscholar   +1 more source

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale [PDF]

open access: yesInterspeech, 2021
This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of
Arun Babu   +12 more
semanticscholar   +1 more source

FLEURS: FEW-Shot Learning Evaluation of Universal Representations of Speech [PDF]

open access: yesSpoken Language Technology Workshop, 2022
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours ...
Alexis Conneau   +8 more
semanticscholar   +1 more source

Speech recognition with deep recurrent neural networks [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2013
Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is ...
Alex Graves   +2 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy