The Cardamom workbench for historical and under-resourced languages
This paper describes the creation of a workbench tool designed to make technologies developed throughout the lifespan of the Cardamom project easily accessible to researchers who could most benefit from them, but who may not have the technical expertise to apply bleeding edge technologies to their own datasets.
Doyle, Adrian +5 more
openaire +4 more sources
Author identification for Under-Resourced language (KadazanDusun)
<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia.
Nursyahirah Tarmizi +2 more
openaire +2 more sources
Creating language resources for under-resourced languages: methodologies, and experiments with Arabic [PDF]
Language resources are important for those working on computational methods to analyse and study languages. These resources are needed to help advancing the research in fields such as natural language processing, machine learning, information retrieval and text analysis in general.
Mahmoud El-Haj +2 more
openaire +3 more sources
Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS) [PDF]
Machine learning has revolutionised speech technologies for major world languages, but these technologies have generally not been available for the roughly 4,000 languages with populations of fewer than 10,000 speakers.
František Kratochvíl +48 more
core +1 more source
Statistical speech and language processing techniques, requiring large amounts of training data, are currently state-of-the-art in automatic speech recognition.
CUCU, H. +3 more
doaj +1 more source
Automatic Speech Recognition Using Limited Vocabulary: A Survey
Automatic Speech Recognition (ASR) is an active field of research due to its large number of applications and the proliferation of interfaces or computing devices that can support speech processing.
Jean Louis K. E Fendji +3 more
doaj +1 more source
Code-Switching in Automatic Speech Recognition: The Issues and Future Directions
Code-switching (CS) in spoken language is where the speech has two or more languages within an utterance. It is an unsolved issue in automatic speech recognition (ASR) research as ASR needs to recognise speech in bilingual and multilingual settings ...
Mumtaz Begum Mustafa +6 more
doaj +1 more source
Extractive summarization of Malayalam documents using latent Dirichlet allocation: An experience
Automatic text summarization (ATS) extracts information from a source text and presents it to the user in a condensed form while preserving its primary content.
Kondath Manju +2 more
doaj +1 more source
Offensive Language Detection in Under-Resourced Algerian Dialectal Arabic Language
This paper addresses the problem of detecting the offensive and abusive content in Facebook comments, where we focus on the Algerian dialectal Arabic which is one of under-resourced languages. The latter has a variety of dialects mixed with different languages (i.e. Berber, French and English). In addition, we deal with texts written in both Arabic and
Oussama Boucherit, Kheireddine Abainia
openaire +2 more sources
Domain Generalization for Language-Independent Automatic Speech Recognition
A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained.
Heting Gao +6 more
doaj +1 more source

