Results 41 to 50 of about 14,332 (195)
Spoken term detection with Connectionist Temporal Classification: A novel hybrid CTC-DBN decoder [PDF]
This paper proposes a novel system for robust keyword detection in continuous speech. Our decoder is composed of a bidirectional Long Short-Term Memory recurrent neural network using a Connectionist Temporal Classification (CTC) output layer, and a Dynamic Bayesian Network (DBN).
Wöllmer, Martin +3 more
openaire +4 more sources
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification [PDF]
EMNLP ...
Libovický, Jindřich, Helcl, Jindřich
openaire +4 more sources
Towards end-to-end speech recognition with transfer learning
A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training with matrix factorization algorithm is ...
Chu-Xiong Qin, Dan Qu, Lian-Hai Zhang
doaj +1 more source
Bidirectional Representations for Low-Resource Spoken Language Understanding
Speech representation models lack the ability to efficiently store semantic information and require fine tuning to deliver decent performance. In this research, we introduce a transformer encoder–decoder framework with a multiobjective training strategy,
Quentin Meeus +2 more
doaj +1 more source
Digital display instrument identification is a crucial approach for automating the collection of digital display data. In this study, we propose a digital display area detection CTPNpro algorithm to address the problem of recognizing multiclass digital ...
Xuanzhang Wen +5 more
doaj +1 more source
Learning Hard Alignments with Variational Inference
There has recently been significant interest in hard attention models for tasks such as object recognition, visual captioning and speech recognition. Hard attention can offer benefits over soft attention such as decreased computational cost, but training
Chiu, Chung-Cheng +5 more
core +1 more source
Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary. In this paper, we propose a multi-task network that performs KWS and SV simultaneously to fully
Goo, Jahyun +3 more
core +1 more source
Attention-based CNN-ConvLSTM for Handwritten Arabic Word Extraction
Word extraction is one of the most critical steps in handwritten recognition systems. It is challenging for many reasons, such as the variability of handwritten writing styles, touching and overlapping characters, skewness problems, diacritics ...
takwa Ben Aicha, Afef Kacem Echi
doaj +1 more source
Simultaneous Neural Machine Translation using Connectionist Temporal Classification
Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input. This task faces a trade-off between translation accuracy and latency. We have to determine when we start the translation for observed inputs so far, to achieve good practical performance. In this work, we propose a neural
Chousa, Katsuki +2 more
openaire +2 more sources
Inter‐Model Feature Fusion for Robust Low‐Resource Speech Recognition
Our Self‐Supervised Feature Fusion (SSF‐FT) method enhances low‐resource speech recognition by adaptively combining features from self‐supervised models trained with Contrastive, Predictive, and Reconstruction objectives. This attention‐weighted ensemble delivers robust performance, particularly in acoustically challenging conditions, extending current
Ussen Kimanuka +2 more
wiley +1 more source

