Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference. [PDF]
Speech segmentation is a crucial step in automatic speech recognition because additional speech analyses are performed for each framed speech segment.
Lee B, Cho KH.
europepmc +2 more sources
Automatic Speech Segmentation Based on HMM [PDF]
This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the ...
M. Kroul
doaj +2 more sources
Lexical knowledge boosts statistically-driven speech segmentation. [PDF]
The hypothesis that known words can serve as anchors for discovering new words in connected speech has computational and empirical support. However, evidence for how the bootstrapping effect of known words interacts with other mechanisms of lexical acquisition, such as statistical learning, is incomplete.
Palmer SD, Hutson J, White L, Mattys SL.
europepmc +6 more sources
Prosodic cues enhance rule learning by changing speech segmentation mechanisms. [PDF]
Prosody has been claimed to have a critical role in the acquisition of grammatical information from speech. The exact mechanisms by which prosodic cues enhance learning are fully unknown.
de Diego-Balaguer R +2 more
europepmc +2 more sources
Segmentation of Speech and Humming in Vocal Input [PDF]
Non-verbal vocal interaction (NVVI) is an interaction method in which sounds other than speech produced by a human are used, such as humming. NVVI complements traditional speech recognition systems with continuous control.
A. J. Sporka, O. Polacek, J. Havlik
doaj +2 more sources
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units [PDF]
Self-supervised approaches for speech representation learning are challenged by three unique problems: (1) there are multiple sound units in each input utterance, (2) there is no lexicon of input sound units during the pre-training phase, and (3) sound ...
Wei-Ning Hsu +5 more
semanticscholar +1 more source
Selection of acoustic modeling unit for Tibetan speech recognition based on deep learning [PDF]
The selection of the speech recognition modeling unit is the primary problem of acoustic modeling in speech recognition, and different acoustic modeling units will directly affect the overall performance of speech recognition.
Gong Baojia +4 more
doaj +1 more source
End-to-End Simultaneous Speech Translation with Differentiable Segmentation [PDF]
End-to-end simultaneous speech translation (SimulST) outputs translation while receiving the streaming speech inputs (a.k.a. streaming speech translation), and hence needs to segment the speech inputs and then translate based on the current received ...
Shaolei Zhang, Yang Feng
semanticscholar +1 more source
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation [PDF]
Speech translation models are unable to directly process long audios, like TED talks, which have to be split into shorter segments. Speech translation datasets provide manual segmentations of the audios, which are not available in real-world scenarios ...
Yiannis (Ioannis) Tsiamas +3 more
semanticscholar +1 more source
Phonemic segmentation of narrative speech in human cerebral cortex
Speech processing requires extracting meaning from acoustic patterns using a set of intermediate representations based on a dynamic segmentation of the speech stream.
Xue L Gong +5 more
semanticscholar +1 more source

