Results 61 to 70 of about 1,019,266 (189)
Deep Multimodal Semantic Embeddings for Speech and Images
In this paper, we present a model which takes as input a corpus of images with relevant spoken captions and finds a correspondence between the two modalities. We employ a pair of convolutional neural networks to model visual objects and speech signals at
Glass, James, Harwath, David
core +1 more source
Multilingual search for cultural heritage archives via combining multiple translation resources [PDF]
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages requiring a facility for effective multilingual search.
Debole, Franca +4 more
core
Speaker-Dependent Speech Enhancement Using Codebook-based Synthesis for Low SNR Applications
In this paper, a speaker-dependent speech enhancement is performed by using the codebooks. For this purpose, making use of the STFT parameters, two codebooks are designed for speech and noise separately.
Roghayeh Doost, Abolghasem Sayadian
doaj
Chinese Speech Recognition Using Conformer Fused with Max Pooling [PDF]
Speech recognition technology enables machines to understand human speech using advanced algorithms and signal processing technologies, thereby making communication between humans and machines more convenient.
HU Conggang, YANG Lipeng, SUN Yongqi, CHEN Hualong, HAN Keke
doaj +1 more source
GTH-UPM system for search on speech evaluation
This paper describes the GTH-UPM system for the Albayzin 2014 Search on Speech Evaluation. Teh evaluation task consists of searching a list of terms/queries in audio files. The GTH-UPM system we are presenting is based on a LVCSR (Large Vocabulary Continuous Speech Recognition) system.
Echeverry Correa, Julian David +2 more
openaire +1 more source
An Optimized Hyperparameter Tuning for Improved Hate Speech Detection with Multilayer Perceptron
Hate speech classification is a critical task in the domain of natural language processing, aiming to mitigate the negative impacts of harmful content on digital platforms.
Muhamad Ridwan, Ema Utami
doaj +1 more source
UTwente does Brave New Tasks for MediaEval 2012: Searching and Hyperlinking [PDF]
In this paper we report our experiments and results for the brave new searching and hyperlinking tasks for the MediaEval Benchmark Initiative 2012. The searching task involves nding target video segments based on a short natural language sentence query ...
Aly, R.B.N. +2 more
core +1 more source
This paper presents a new method for a quick similarity-based search through long unlabeled audio streams to detect and locate audio clips provided by users.
Kashino, Kunio +3 more
core +1 more source
A lexical database tool for quantitative phonological research
A lexical database tool tailored for phonological research is described. Database fields include transcriptions, glosses and hyperlinks to speech files.
Bird, Steven
core +3 more sources
Dublin City University video track experiments for TREC 2002 [PDF]
Dublin City University participated in the Feature Extraction task and the Search task of the TREC-2002 Video Track. In the Feature Extraction task, we submitted 3 features: Face, Speech, and Music. In the Search task, we developed an interactive video
Browne, Paul +10 more
core

