Results 1 to 10 of about 420 (159)

Automated Caption Generation for Video Call with Language Translation [PDF]

open access: yesE3S Web of Conferences, 2023
In the modern era, virtual communication between individuals is common. Many people’s lives have been made simpler in a number of circumstances by providing subtitles, generating automated captions for social media videos, and language translation from a
Polepaka Sanjeeva   +4 more
doaj   +2 more sources

Automated Assessment of Word- and Sentence-Level Speech Intelligibility in Developmental Motor Speech Disorders: A Cross-Linguistic Investigation [PDF]

open access: yesDiagnostics
Background/Objectives: Accurate assessment of speech intelligibility is necessary for individuals with motor speech disorders. Transcription or scaled rating methods by naïve listeners are the most reliable tasks for these purposes; however, they are ...
Micalle Carl, Michal Icht
doaj   +2 more sources

Enhancing supermarket robot interaction: an equitable multi-level LLM conversational interface for handling diverse customer intents [PDF]

open access: yesFrontiers in Robotics and AI
This paper presents the design and evaluation of a comprehensive system to develop voice-based interfaces to support users in supermarkets. These interfaces enable shoppers to convey their needs through both generic and specific queries.
Chandran Nandkumar, Luka Peternel
doaj   +2 more sources

From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool [PDF]

open access: yesBMC Research Notes
Background Verbatim transcription of qualitative audio data is a cornerstone of analytic quality and rigor, yet the time and energy required for such transcription can drain resources, delay analysis, and hinder the timely dissemination of qualitative ...
Hannah Tolle   +6 more
doaj   +2 more sources

Spoken Language Analysis in Aging Research: The Validity of AI-Generated Speech to Text Using OpenAI's Whisper. [PDF]

open access: yesGerontology
Introduction: Studying what older adults say can provide important insights into cognitive, affective, and social aspects of aging. Available language analysis tools generally require audio-recorded speech to be transcribed into verbatim text, a task that has historically been performed by humans.
Naffah A, Pfeifer VA, Mehl MR.
europepmc   +3 more sources

Adapting OpenAI’s Whisper for Speech Recognition on Code-Switch Mandarin-English SEAME and ASRU2019 Datasets

open access: yes2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
This paper details the experimental results of adapting the OpenAI's Whisper model for Code-Switch Mandarin-English Speech Recognition (ASR) on the SEAME and ASRU2019 corpora. We conducted 2 experiments: a) using adaptation data from 1 to 100/200 hours to demonstrate effectiveness of adaptation, b) examining different language ID setup on Whisper ...
Yizhou Peng, Eng Siong Chng
exaly   +3 more sources

Аналiза і клясифікация русиньской бесіды языковым модельом штучной інтеліґенциі OpenAI Whisper

open access: yesRìčnik Ruskoj Bursy
ANALYSIS AND CLASSIFICATION OF THE RUSYN LANGUAGE USING THE OPENAI WHISPER ASR MODELThe paper presents a linguistic analysis of the Rusyn language, focusing on its complex and dynamic aspects, such as pronunciation and individual, regional, and historical variations.
Pawel Malecki
exaly   +2 more sources

Video Transcripts Summarization using OpenAI Whisper and GPT Model

open access: yesInternational Journal for Research in Applied Science and Engineering Technology
Abstract: In today’s digital age, a vast amount of video content is generated and shared on the internet every minute. However, extracting relevant information from these videos can be time-consuming and challenging. This is where video transcript summarization comes in, providing a concise summary of video content without the need to watch the entire ...
exaly   +2 more sources

Fine-Tuning OpenAI Whisper-Small for Domain-Specific Medical Speech Recognition within a Microservice Architecture

open access: yesInformatica (Slovenia)
We fine-tune Whisper-small (244M parameters) on 8.5 hours of in-domain medical audio and evaluatewith word error rate (WER). Compared to an unadapted Whisper-small baseline, our fine-tuned modelreduces WER from ∼63% to ∼32%. While the relative gain is substantial, this accuracy is not suitablefor unsupervised clinical use; we position the system as a ...
Alaeddine Moussa, Noursene Drine
exaly   +2 more sources

Instant Transcription and Translation Tool using OpenAI?s Whisper ASR Model

open access: yesInternational Journal of Science and Research (Raipur, India), 2022
Akarsh Ghale, Janaki K, Devaraj Verma C
exaly   +2 more sources

Home - About - Disclaimer - Privacy