Developing an Orthography for Onya Darat (Western Borneo) Practical and Theoretical Considerations [PDF]
Onya Darat is a language spoken, with great dialectal variation, in the interiorof western Borneo. It is the southernmost member of Land Dayak, a branchof the Austronesian language family.
Tadmor, U. (Uri)
core +4 more sources
Introduction: Innovation in spoken corpus linguistics
Over the decades, technological advancements have substantially improved the efficiency and scope of spoken corpus compilation, but there remain many challenges ––both practical and theoretical–– that constrain 1) the quality of spoken corpus data, 2 ...
Robbie Love
semanticscholar +1 more source
De l’oral à l’écrit dans les contes de tradition orale : quelques considérations à partir d’un exemple acadien [PDF]
As part of a broader approach to the issue of transcription and editing of oral tales, this article compares the transcription of an Acadian tale from the oral tradition to its edited version, highlighting the changes that appear in this passage from ...
Cristina PETRAȘ
doaj
The field of text-to-emotion analysis is investigated in this study, which uses an interactive methodology to reveal subtle emotional insights in textual data.
Fatima M Inamdar+7 more
semanticscholar +1 more source
Geoffrey Leech, Paul Rayson and Andrew Wilson. Word Frequencies in Written and Spoken English
Geoffrey Leech, Emeritus Professor in the Department of Linguistics and Mod-ern English Language at Lancaster University, has been the co-editor and co-author of much research on English grammar, and computational and corpus linguistics.
Michaël Abecassis
doaj +1 more source
TEICORPO: A Conversion Tool for Spoken Language Transcription with a Pivot File in TEI
CORLI is a consortium of Huma-Num, the French national infrastructure dedicated to the technical support and promotion of digital humanities. The goal of CORLI is to promote and provide tools and information for good and efficient research practices in ...
C. Parisse, C. Etienne, Loïc Liégeois
semanticscholar +1 more source
The Shakespeare’s World Crowdsourced Transcription Project Datasets
The Shakespeare’s World Datasets derive from a crowdsourced transcription project hosted on the Zooniverse platform between 2015 and 2019 (Van Hyning et al, 2015–2019).
Victoria Van Hyning, ZhiCheng Wang
doaj +1 more source
Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition without Length Bias
As one popular modeling approach for end-to-end speech recognition, attention-based encoder-decoder models are known to suffer the length bias and corresponding beam problem.
Ney, Hermann, Schlüter, Ralf, Zhou, Wei
core +1 more source
The Spoken Language Corpus at the Linguistics Department, Göteborg University
This paper summarizes work on spoken language at the Department of Linguistics Göteborg University. In addition to describing the recordings contained in the Spoken Language Corpus of Swedish at Göteborg University, we discuss the standard of ...
Jens Allwood+4 more
doaj
Fine-tuning on Clean Data for End-to-End Speech Translation: FBK @ IWSLT 2018 [PDF]
This paper describes FBK's submission to the end-to-end English-German speech translation task at IWSLT 2018. Our system relies on a state-of-the-art model based on LSTMs and CNNs, where the CNNs are used to reduce the temporal dimension of the audio ...
Cattoni, Roldano+4 more
core +1 more source