Results 1 to 10 of about 1,640,961 (217)

HierTTS: Expressive End-to-End Text-to-Waveform Using a Multi-Scale Hierarchical Variational Auto-Encoder

open access: yesApplied Sciences, 2023
End-to-end text-to-speech (TTS) models that directly generate waveforms from text are gaining popularity. However, existing end-to-end models are still not natural enough in their prosodic expressiveness.
Zengqiang Shang   +4 more
doaj   +1 more source

Explore Long-Range Context Features for Speaker Verification

open access: yesApplied Sciences, 2023
Multi-scale context information, especially long-range dependency, has shown to be beneficial for speaker verification (SV) tasks. In this paper, we propose three methods to systematically explore long-range context SV feature extraction based on ResNet ...
Zhuo Li   +4 more
doaj   +1 more source

An individualization approach for head-related transfer function in arbitrary directions based on deep learning [PDF]

open access: yesJASA Express Letters, 2022
This paper provides an individualization approach for head-related transfer function (HRTF) in arbitrary directions based on deep learning by utilizing dual-autoencoder architecture to establish the relationship between HRTF magnitude spectrum and ...
Dingding Yao   +6 more
doaj   +1 more source

Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification

open access: yesApplied Sciences, 2021
A Recurrent Neural Networks (RNN) based attention model has been used in code-switching speech recognition (CSSR). However, due to the sequential computation constraint of RNN, there are stronger short-range dependencies and weaker long-range ...
Zheying Huang   +5 more
doaj   +1 more source

Confidence Learning for Semi-Supervised Acoustic Event Detection

open access: yesApplied Sciences, 2021
In recent years, the involvement of synthetic strongly labeled data, weakly labeled data, and unlabeled data has drawn much research attention in semi-supervised acoustic event detection (SAED).
Yuzhuo Liu   +4 more
doaj   +1 more source

Query-by-Example with Acoustic Word Embeddings Using wav2vec Pretraining [PDF]

open access: yesJisuanji kexue, 2022
Query-by-Example is a popular keyword detection method in the absence of speech resources.It can build a keyword query system with excellent performance when there are few labeled voice resources and a lack of pronunciation dictionaries.In recent years ...
LI Zhao-qi, LI Ta
doaj   +1 more source

A Feature Optimization Approach Based on Inter-Class and Intra-Class Distance for Ship Type Classification

open access: yesSensors, 2020
Deep learning based methods have achieved state-of-the-art results on the task of ship type classification. However, most existing ship type classification algorithms take time–frequency (TF) features as input, the underlying discriminative information ...
Chen Li   +4 more
doaj   +1 more source

Temporal Convolution Network Based Joint Optimization of Acoustic-to-Articulatory Inversion

open access: yesApplied Sciences, 2021
Articulatory features are proved to be efficient in the area of speech recognition and speech synthesis. However, acquiring articulatory features has always been a difficult research hotspot.
Guolun Sun   +3 more
doaj   +1 more source

A Pronunciation Prior Assisted Vowel Reduction Detection Framework with Multi-Stream Attention Method

open access: yesApplied Sciences, 2021
Vowel reduction is a common pronunciation phenomenon in stress-timed languages like English. Native speakers tend to weaken unstressed vowels into a schwa-like sound.
Zongming Liu   +3 more
doaj   +1 more source

Continuous speech recognition by convolutional neural networks

open access: yes工程科学学报, 2015
Convolutional neural networks (CNNs), which show success in achieving translation invariance for many image processing tasks, were investigated for continuous speech recognition.
ZHANG Qing-qing   +3 more
doaj   +1 more source

Home - About - Disclaimer - Privacy