Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition [PDF]
Connectionist temporal classification (CTC) is a favored decoder in scene text recognition (STR) for its simplicity and efficiency. However, most CTC-based methods utilize one-dimensional (1D) vector sequences, usually derived from a recurrent neural ...
Rina Buoy +3 more
doaj +4 more sources
Integrating international Chinese visualization teaching and vocational skills training: leveraging attention-connectionist temporal classification models [PDF]
The teaching of Chinese as a second language has become increasingly crucial for promoting cross-cultural exchange and mutual learning worldwide. However, traditional approaches to international Chinese language teaching have limitations that hinder ...
Yuan Yao, Zhujun Dai, Muhammad Shahbaz
doaj +5 more sources
Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition [PDF]
The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition. Separately, connectionist temporal classification (CTC) has matured as an alignment-free, non-autoregressive approach ...
Huang, Zhiheng +2 more
core +2 more sources
A Study of All-Convolutional Encoders for Connectionist Temporal Classification [PDF]
Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs).
Gimpel, Kevin +3 more
core +2 more sources
Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification [PDF]
Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training.
Chen, Yun-Nung +4 more
core +2 more sources
SqueezeCall: nanopore basecalling using a Squeezeformer network [PDF]
Nanopore sequencing, a third-generation sequencing technique, enables direct RNA sequencing, real-time analysis, and long-read length. Nanopore sequencers measure electrical current changes as nucleotides pass through nanopores; a basecaller identifies ...
Zhongxu Zhu
doaj +2 more sources
MPSA-Conformer-CTC/Attention: A High-Accuracy, Low-Complexity End-to-End Approach for Tibetan Speech Recognition [PDF]
This study addresses the challenges of low accuracy and high computational demands in Tibetan speech recognition by investigating the application of end-to-end networks. We propose a decoding strategy that integrates Connectionist Temporal Classification
Changlin Wu +3 more
doaj +2 more sources
Multilingual end-to-end ASR for low-resource Turkic languages with common alphabets [PDF]
To obtain a reliable and accurate automatic speech recognition (ASR) machine learning model, it is necessary to have sufficient audio data transcribed, for training. Many languages in the world, especially the agglutinative languages of the Turkic family,
Akbayan Bekarystankyzy +4 more
doaj +2 more sources
Context Conditioning via Surrounding Predictions for Non-Recurrent CTC Models
Connectionist Temporal Classification (CTC) loss has become widely used in sequence modeling tasks such as Automatic Speech Recognition (ASR) and Handwritten Text Recognition (HTR) due to its ease of use.
Burin Naowarat +2 more
doaj +1 more source
A study of transformer-based end-to-end speech recognition system for Kazakh language
Today, the Transformer model, which allows parallelization and also has its own internal attention, has been widely used in the field of speech recognition.
Mamyrbayev Orken +4 more
doaj +1 more source

