Results 11 to 20 of about 131,410 (384)

Domestic dogs (Canis familiaris) recognise meaningful content in monotonous streams of read speech. [PDF]

open access: yesAnim Cogn
Domestic dogs (Canis familiaris) can recognize basic phonemic information from human speech and respond to commands. Commands are typically presented in isolation with exaggerated prosody known as dog-directed speech (DDS) register.
Root-Gutteridge H   +3 more
europepmc   +2 more sources

On the Utility of Self-Supervised Models for Prosody-Related Tasks [PDF]

open access: yesSpoken Language Technology Workshop, 2022
Self-Supervised Learning (SSL) from speech data has produced models that have achieved remarkable performance in many tasks, and that are known to implicitly represent many aspects of information latently present in speech signals.
Guan-Ting Lin   +7 more
semanticscholar   +1 more source

Text-Free Prosody-Aware Generative Spoken Language Modeling [PDF]

open access: yesAnnual Meeting of the Association for Computational Linguistics, 2021
Speech pre-training has primarily demonstrated efficacy on classification tasks, while its capability of generating novel speech, similar to how GPT-2 can generate coherent paragraphs, has barely been explored. Generative Spoken Language Modeling (GSLM) (
E. Kharitonov   +10 more
semanticscholar   +1 more source

Prosospeech: Enhancing Prosody with Quantized Vector Pre-Training in Text-To-Speech [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2022
Expressive text-to-speech (TTS) has become a hot research topic recently, mainly focusing on modeling prosody in speech. Prosody modeling has several challenges: 1) the extracted pitch used in previous prosody modeling works have inevitable errors, which
Yi Ren   +6 more
semanticscholar   +1 more source

A New Approach to the Persian Prosodies based on Music Tetrachords [PDF]

open access: yesLiterary Arts, 2021
There is no doubt that the quantitative meter is the essence of music and prosodies, providing the link between poetry and music. In addition to the time priority that music has over prosodies, the similarity between these elements makes it apparent that
Mehran Mhboobi moqadam, Ali Heydari
doaj   +1 more source

Prosody Is Not Identity: A Speaker Anonymization Approach Using Prosody Cloning

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2023
Prosody is closely linked to the identity of a speaker, leading to individual pitch and intonation patterns. Therefore, it is challenging in speaker anonymization to generate speech utterances that both keep the original audio’s main prosodic structure ...
Sarina Meyer   +5 more
semanticscholar   +1 more source

Fully-Hierarchical Fine-Grained Prosody Modeling For Interpretable Speech Synthesis [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2020
This paper proposes a hierarchical, fine-grained and interpretable latent variable model for prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution modeling of prosody by conditioning finer level representations on coarser ...
Guangzhi Sun   +5 more
semanticscholar   +1 more source

Generating Diverse and Natural Text-to-Speech Samples Using a Quantized Fine-Grained VAE and Autoregressive Prosody Prior [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2020
Recent neural text-to-speech (TTS) models with fine-grained latent features enable precise control of the prosody of synthesized speech. Such models typically incorporate a fine-grained variational autoencoder (VAE) structure, extracting latent features ...
Guangzhi Sun   +7 more
semanticscholar   +1 more source

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech [PDF]

open access: yesInterspeech, 2020
Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesising speech. Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source audio ...
S. Karlapati   +5 more
semanticscholar   +1 more source

Mark my words: tone of voice changes affective word representations in memory. [PDF]

open access: yesPLoS ONE, 2010
The present study explored the effect of speaker prosody on the representation of words in memory. To this end, participants were presented with a series of words and asked to remember the words for a subsequent recognition test. During study, words were
Annett Schirmer
doaj   +1 more source

Home - About - Disclaimer - Privacy