Multifractal Hopscotch in Hopscotch by Julio Cortázar [PDF]
Punctuation is the main factor introducing correlations in natural language written texts and it crucially impacts their overall effectiveness, expressiveness, and readability.
Jakub Dec +4 more
doaj +2 more sources
Libriheavy: A 50,000 Hours ASR Corpus with Punctuation Casing and Context [PDF]
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions ...
Wei Kang +7 more
semanticscholar +1 more source
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models [PDF]
Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format.
Aleksandr Meister +5 more
semanticscholar +1 more source
Where’s the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation [PDF]
Many NLP pipelines split text into sentences as one of the crucial preprocessing steps. Prior sentence segmentation tools either rely on punctuation or require a considerable amount of sentence-segmented training data: both central assumptions might fail
Benjamin Minixhofer +2 more
semanticscholar +1 more source
Alzheimer Disease Classification through ASR-based Transcriptions: Exploring the Impact of Punctuation and Pauses [PDF]
Alzheimer's Disease (AD) is the world's leading neurodegenerative disease, which often results in communication difficulties. Analysing speech can serve as a diagnostic tool for identifying the condition.
Lucía Gómez-Zaragozá +5 more
semanticscholar +1 more source
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person [PDF]
Automatic speech recognition (ASR) systems play a key role in applications involving human-machine interactions. Despite their importance, ASR models for the Portuguese language proposed in the last decade have limitations in relation to the correct ...
L. Gris +5 more
semanticscholar +1 more source
FullStop: Punctuation and Segmentation Prediction for Dutch with Transformers [PDF]
When applying automated speech recognition (ASR) for Belgian Dutch (Van Dyck et al. 2021), the output consists of an unsegmented stream of words, without any punctuation.
Vincent Vandeghinste, Oliver Guhr
semanticscholar +1 more source
Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus [PDF]
The punctuation restoration task aims to correctly punctuate the output transcriptions of automatic speech recognition systems. Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real ...
Yaoming Zhu +3 more
semanticscholar +1 more source
Streaming Punctuation for Long-form Dictation with Transformers [PDF]
While speech recognition Word Error Rate (WER) has reached human parity for English, longform dictation scenarios still suffer from segmentation and punctuation problems resulting from irregular pausing patterns or slow speakers.
Piyush Behre +3 more
semanticscholar +1 more source
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition [PDF]
While speech recognition Word Error Rate (WER) has reached human parity for English, continuous speech recognition scenarios such as voice typing and meeting transcriptions still suffer from segmentation and punctuation problems, resulting from irregular
Piyush Behre +3 more
semanticscholar +1 more source

