Results 11 to 20 of about 3,404,666 (322)

AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline [PDF]

open access: yesOriental COCOSDA International Conference on Speech Database and Assessments, 2017
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin.
Bu, Hui   +4 more
core   +2 more sources

Prompting Large Language Models with Speech Recognition Abilities [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2023
Large language models (LLMs) have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering.
Yassir Fathullah   +11 more
semanticscholar   +1 more source

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages [PDF]

open access: yesarXiv.org, 2023
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (
Yu Zhang   +26 more
semanticscholar   +1 more source

End-to-End Speech Recognition: A Survey [PDF]

open access: yesIEEE/ACM Transactions on Audio Speech and Language Processing, 2023
In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning has brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning.
Rohit Prabhavalkar   +4 more
semanticscholar   +1 more source

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition [PDF]

open access: yesAutomatic Speech Recognition & Understanding, 2023
Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel ...
D. Rekesh   +7 more
semanticscholar   +1 more source

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2023
We present a novel Speech Augmented Language Model (SALM) with multitask and in-context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task ...
Zhehuai Chen   +8 more
semanticscholar   +1 more source

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2023
We propose a decoder-only language model, VoxtLM, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation.
Soumi Maiti   +5 more
semanticscholar   +1 more source

Conformer: Convolution-augmented Transformer for Speech Recognition [PDF]

open access: yesInterspeech, 2020
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs).
Anmol Gulati   +10 more
semanticscholar   +1 more source

End-To-End Audio-Visual Speech Recognition with Conformers [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2021
In this work, we present a hybrid CTC/Attention model based on a ResNet-18 and Convolution-augmented transformer (Conformer), that can be trained in an end-to-end manner.
Pingchuan Ma   +2 more
semanticscholar   +1 more source

Deep Learning Enabled Semantic Communications With Speech Recognition and Synthesis [PDF]

open access: yesIEEE Transactions on Wireless Communications, 2022
In this paper, we develop a deep learning based semantic communication system for speech transmission, named DeepSC-ST. We take the speech recognition and speech synthesis as the transmission tasks of the communication system, respectively.
Zhenzi Weng   +5 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy