Automated Caption Generation for Video Call with Language Translation [PDF]
In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a
Polepaka Sanjeeva +4 more
doaj +2 more sources
Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation [PDF]
Background/Objectives: Accurate assessment of speech intelligibility is necessary for individuals with motor speech disorders. Transcription or scaled rating methods by naïve listeners are the most reliable tasks for these purposes; however, they are ...
Micalle Carl, Michal Icht
doaj +2 more sources
Enhancing supermarket robot interaction: an equitable multi-level LLM conversational interface for handling diverse customer intents [PDF]
This paper presents the design and evaluation of a comprehensive system to develop voice-based interfaces to support users in supermarkets. These interfaces enable shoppers to convey their needs through both generic and specific queries.
Chandran Nandkumar, Luka Peternel
doaj +2 more sources
From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool [PDF]
Background Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative ...
Hannah Tolle +6 more
doaj +2 more sources
Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper. [PDF]
Introduction: Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans.
Naffah A, Pfeifer VA, Mehl MR.
europepmc +3 more sources
This paper details the experimental results of adapting the OpenAI's Whisper model for Code-Switch Mandarin-English Speech Recognition (ASR) on the SEAME and ASRU2019 corpora. We conducted 2 experiments: a) using adaptation data from 1 to 100/200 hours to demonstrate effectiveness of adaptation, b) examining different language ID setup on Whisper ...
Yizhou Peng, Eng Siong Chng
exaly +3 more sources
Аналiза і клясифікация русиньской бесіды языковым модельом штучной інтеліґенциі OpenAI Whisper
ANALYSIS AND CLASSIFICATION OF THE RUSYN LANGUAGE USING THE OPENAI WHISPER ASR MODELThe paper presents a linguistic analysis of the Rusyn language, focusing on its complex and dynamic aspects, such as pronunciation and individual, regional, and historical variations.
Pawel Malecki
exaly +2 more sources
Video Transcripts Summarization using OpenAI Whisper and GPT Model
Abstract: In today’s digital age, a vast amount of video content is generated and shared on the internet every minute. However, extracting relevant information from these videos can be time-consuming and challenging. This is where video transcript summarization comes in, providing a concise summary of video content without the need to watch the entire ...
exaly +2 more sources
We fine-tune Whisper-small (244M parameters) on 8.5 hours of in-domain medical audio and evaluatewith word error rate (WER). Compared to an unadapted Whisper-small baseline, our fine-tuned modelreduces WER from ∼63% to ∼32%. While the relative gain is substantial, this accuracy is not suitablefor unsupervised clinical use; we position the system as a ...
Alaeddine Moussa, Noursene Drine
exaly +2 more sources
Instant Transcription and Translation Tool using OpenAI?s Whisper ASR Model
Akarsh Ghale, Janaki K, Devaraj Verma C
exaly +2 more sources

