Results 61 to 70 of about 288,136 (202)

Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection [PDF]

open access: yesIEEE International Conference on Acoustics, Speech, and Signal Processing, 2018
While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information.
Yu-Hsuan Wang, Hung-yi Lee, Lin-Shan Lee
semanticscholar   +1 more source

Overview of the NTCIR-12 SpokenQuery&Doc-2 task [PDF]

open access: yes, 2016
This paper presents an overview of the Spoken Query and Spoken Document retrieval (SpokenQuery&Doc-2) task at the NTCIR-12 Workshop. This task included spoken query driven spoken content retrieval (SQ-SCR) and a spoken query driven spoken term ...
Akiba, Tomoyosi   +3 more
core  

Transformer-Based Encoder-Encoder Architecture for Spoken Term Detection

open access: yes, 2023
The paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including convolutional and upsampling layers, attention masking, and shared parameters. The encoders project a recognized hypothesis and a searched term
Jan Švec, Luboš Šmídl, Jan Lehečka
openaire   +2 more sources

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

open access: yesEURASIP Journal on Audio, Speech, and Music Processing, 2019
Audio signals represent a wide diversity of acoustic events, from background environmental noise to spoken communication. Machine learning models such as neural networks have already been proposed for audio signal modeling, where recurrent structures can
Diego de Benito-Gorron   +3 more
doaj   +1 more source

Fast and Accurate OOV Decoder on High-Level Features

open access: yes, 2017
This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for
Khokhlov, Yuri   +3 more
core   +1 more source

Zero-resource audio-only spoken term detection based on a combination of template matching techniques [PDF]

open access: yes, 2011
spoken term detection, template matching, unsupervised learning, posterior featuresInternational audienceSpoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of ...
Bimbot, Frédéric   +2 more
core   +2 more sources

Spoken Language Intent Detection using Confusion2Vec

open access: yes, 2019
Decoding speaker's intent is a crucial part of spoken language understanding (SLU). The presence of noise or errors in the text transcriptions, in real life scenarios make the task more challenging.
Georgiou, Panayiotis   +2 more
core   +1 more source

Application of out-of-language detection to spoken term detection [PDF]

open access: yes2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010
This paper investigates the detection of English spoken terms in a conversational multi-language scenario. The speech is processed using a large vocabulary continuous speech recognition system. The recognition output is represented in the form of word recognition lattices which are then used to search required terms.
Motlicek P., Valente F.
openaire   +1 more source

Experimental studies on effect of speaking mode on spoken term detection [PDF]

open access: yes, 2015
The objective of this paper is to study the effect of speaking mode on spoken term detection (STD) system. The experiments are conducted with respect to query words recorded in isolated manner and words cut out from continuous speech.
Kodukula, Sri Rama Murty   +2 more
core  

Towards a Knowledge Graph based Speech Interface

open access: yes, 2017
Applications which use human speech as an input require a speech interface with high recognition accuracy. The words or phrases in the recognised text are annotated with a machine-understandable meaning and linked to knowledge graphs for further ...
Auer, Sören   +3 more
core   +1 more source

Home - About - Disclaimer - Privacy