Results 281 to 290 of about 1,469,086 (346)

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

IEEE Transactions on Audio, Speech, and Language Processing, 2023
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a ...
Chengyi Wang   +12 more
semanticscholar   +1 more source

SPEECH ETIQUETTE AND SPEECH ACTIVITI

2023
Speech etiquette refers to the system of speech behavior, the rules of live conversation and correspondence, how to use the language and its tools in a specific situation and environment. English speech etiquette is a set of special words and expressions that give a polite form to English speech, as well as the rules according to which these words and ...
Niyatova Maftuna Norbek qizi   +1 more
openaire   +1 more source

Hate Speech, Sex Speech, Free Speech

Choice Reviews Online, 1997
A powerful indictment of contemporary attacks on free speech, this book argues for a vigorous First Amendment jurisprudence protecting even offensive types of speech. In recent years, political activists, academics, and legal specialists have attacked traditional notions of free speech protection as they concern hate speech, obscenity, and pornography.
openaire   +2 more sources

Interpretation of speech rhythm: Speech error, speech rhythm, and speech proficiency

The Journal of the Acoustical Society of America, 2023
It has been well-known that second language learners are affected by their first language when producing their L2. For speech rhythm, it has been suggested that L2 speakers are affected by L1 speech rhythm (e.g., Korean learners of English produce English without reducing the duration of unstressed vowels), and the effect is greater when speakers are ...
openaire   +1 more source

Enhancements in Immediate Speech Emotion Detection: Harnessing Prosodic and Spectral Characteristics

International Journal of Innovative Science and Research Technology
Speech is essential to human communication for expressing and understanding feelings. Emotional speech processing has challenges with expert data sampling, dataset organization, and computational complexity in large-scale analysis.
Zewar Shah, Shan Zhiyong, Adnan
semanticscholar   +1 more source

Moshi: a speech-text foundation model for real-time dialogue

arXiv.org
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text ...
Alexandre D'efossez   +7 more
semanticscholar   +1 more source

CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens

arXiv.org
Recent years have witnessed a trend that large language model (LLM) based text-to-speech (TTS) emerges into the mainstream due to their high naturalness and zero-shot capacity.
Zhihao Du   +11 more
semanticscholar   +1 more source

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

arXiv.org
We introduce Seed-TTS, a family of large-scale autoregressive text-to-speech (TTS) models capable of generating speech that is virtually indistinguishable from human speech. Seed-TTS serves as a foundation model for speech generation and excels in speech
Philip Anastassiou   +45 more
semanticscholar   +1 more source

Speech Development

Clinics in Plastic Surgery, 1990
Babies are born nonverbal, yet they spontaneously and seemingly effortlessly acquire the complex skills necessary for oral communication. Interestingly, many of these skills have "golden periods" of maximally efficient learning--failure to acquire the skill at that time may lead to speech delay.
R, Peterson, S, Velleman
openaire   +2 more sources

Home - About - Disclaimer - Privacy