Perception of Phonological Assimilation by Neural Speech Recognition Models [PDF]
Human listeners effortlessly compensate for phonological changes during speech perception, often unconsciously inferring the intended sounds. For example, listeners infer the underlying /n/ when hearing an utterance such as "clea[m] pan", where [m] arises from place assimilation to the following labial [p].
Charlotte Pouw+3 more
arxiv +5 more sources
Inherent Biases of Recurrent Neural Networks for Phonological Assimilation and Dissimilation [PDF]
A recurrent neural network model of phonological pattern learning is proposed. The model is a relatively simple neural network with one recurrent layer, and displays biases in learning that mimic observed biases in human learning. Single-feature patterns are learned faster than two-feature patterns, and vowel or consonant-only patterns are learned ...
Amanda Doucette
arxiv +5 more sources
Speech vocoding for laboratory phonology [PDF]
Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of ...
arxiv +1 more source
For the Purpose of Curry: A UD Treebank for Ashokan Prakrit [PDF]
We present the first linguistically annotated treebank of Ashokan Prakrit, an early Middle Indo-Aryan dialect continuum attested through Emperor Ashoka Maurya's 3rd century BCE rock and pillar edicts. For annotation, we used the multilingual Universal Dependencies (UD) formalism, following recent UD work on Sanskrit and other Indo-Aryan languages.
arxiv
On Structured Sparsity of Phonological Posteriors for Linguistic Parsing [PDF]
The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic and neurocognitive studies recognize the phonological classes at segmental level as the essential and ...
arxiv +1 more source
Differentiable Generative Phonology [PDF]
The goal of generative phonology, as formulated by Chomsky and Halle (1968), is to specify a formal system that explains the set of attested phonological strings in a language. Traditionally, a collection of rules (or constraints, in the case of optimality theory) and underlying forms (UF) are posited to work in tandem to generate phonological strings.
arxiv
Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings [PDF]
The use of phonological features (PFs) potentially allows language-specific phones to remain linked in training, which is highly desirable for information sharing for multilingual and crosslingual speech recognition methods for low-resourced languages.
arxiv
A Phylogenetic Model of the Evolution of Discrete Matrices for the Joint Inference of Lexical and Phonological Language Histories [PDF]
We propose a model of the evolution of a matrix along a phylogenetic tree, in which transformations affect either entire rows or columns of the matrix. This represents the change of both lexical and phonological aspects of linguistic data, by allowing for new words to appear and for systematic phonological changes to affect the entire vocabulary.
arxiv
Applying Syntax$\unicode{x2013}$Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis [PDF]
End-to-end text-to-speech synthesis (TTS), which generates speech sounds directly from strings of texts or phonemes, has improved the quality of speech synthesis over the conventional TTS. However, most previous studies have been evaluated based on subjective naturalness and have not objectively examined whether they can reproduce pitch patterns of ...
arxiv
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding [PDF]
In most cases, bilingual TTS needs to handle three types of input scripts: first language only, second language only, and second language embedded in the first language. In the latter two situations, the pronunciation and intonation of the second language are usually quite different due to the influence of the first language.
arxiv