Results 11 to 20 of about 18,884,699 (343)
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing [PDF]
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
Sanyuan Chen +16 more
semanticscholar +1 more source
SUPERB: Speech processing Universal PERformance Benchmark [PDF]
Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for ...
Shu-Wen Yang +19 more
semanticscholar +1 more source
SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing [PDF]
Paralinguistic speech processing is important in addressing many issues, such as sentiment and neurocognitive disorder analyses. Recently, Transformer has achieved remarkable success in the natural language processing field and has demonstrated its ...
Weidong Chen +4 more
semanticscholar +1 more source
Transformers in Speech Processing: A Survey [PDF]
The remarkable success of transformers in the field of natural language processing has sparked the interest of the speech-processing community, leading to an exploration of their potential for modeling long-range dependencies within speech sequences ...
S. Latif +5 more
semanticscholar +1 more source
Toward a realistic model of speech processing in the brain with self-supervised learning [PDF]
Several deep neural networks have recently been shown to generate activations similar to those of the brain in response to the same input. These algorithms, however, remain largely implausible: they require (1) extraordinarily large amounts of data, (2 ...
Juliette Millet +7 more
semanticscholar +1 more source
Torchaudio: Building Blocks for Audio and Speech Processing [PDF]
This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications ...
Yao-Yuan Yang +22 more
semanticscholar +1 more source
Improving Distortion Robustness of Self-supervised Speech Processing Tasks with Domain Adaptation [PDF]
Speech distortions are a long-standing problem that degrades the performance of supervisely trained speech processing models. It is high time that we enhance the robustness of speech processing models to obtain good performance when encountering speech ...
Kuan-Po Huang +3 more
semanticscholar +1 more source
Continuous speech processing. [PDF]
Brodbeck C, Simon JZ.
europepmc +2 more sources
In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of ...
Hang Chen +12 more
semanticscholar +1 more source
ESPnet: End-to-End Speech Processing Toolkit [PDF]
This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main
Shinji Watanabe +11 more
semanticscholar +1 more source

