Results 11 to 20 of about 1,389,721 (276)

The Impact of Translating Resource-Rich Datasets to Low-Resource Languages Through Multi-Lingual Text Processing

open access: yesIEEE Access, 2021
Urdu is still considered a low-resource language despite being ranked as world’s $10^{th}$ most spoken language with nearly 230 million speakers.
Abdul Ghafoor   +6 more
doaj   +3 more sources

AgglutiFiT: Efficient Low-Resource Agglutinative Language Model Fine-Tuning

open access: yesIEEE Access, 2020
Text classification tends to be difficult when data are inadequate considering the amount of manually labeled text corpora. For low-resource agglutinative languages including Uyghur, Kazakh, and Kyrgyz (UKK languages), in which words are manufactured via
Zhe Li   +3 more
doaj   +3 more sources

Voice Activation for Low-Resource Languages [PDF]

open access: yesApplied Sciences, 2021
Voice activation systems are used to find a pre-defined word or phrase in the audio stream. Industry solutions, such as “OK, Google” for Android devices, are trained with millions of samples. In this work, we propose and investigate several ways to train a voice activation system when the in-domain data set is small.
Kolesau, Aliaksei, Šešok, Dmitrij
openaire   +2 more sources

Endangered Languages are not Low-Resourced [PDF]

open access: yes, 2021
The term low-resourced has been tossed around in the field of natural language processing to a degree that almost any language that is not English can be called "low-resourced"; sometimes even just for the sake of making a mundane or mediocre paper appear more interesting and insightful.
openaire   +3 more sources

Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages

open access: yes, 2023
We conduct an empirical study of neural machine translation (NMT) for truly low-resource languages, and propose a training curriculum fit for cases when both parallel training data and compute resource are lacking, reflecting the reality of most of the world's languages and the researchers working on these languages. Previously, unsupervised NMT, which
Kuwanto, Garry   +5 more
openaire   +2 more sources

Transformers for Low-Resource Languages: Is Féidir Linn! [PDF]

open access: yes, 2021
The Transformer model is the state-of-the-art in Machine Translation. However, in general, neural translation models often under perform on language pairs with insufficient training data. As a consequence, relatively few experiments have been carried out using this architecture on low-resource language pairs.
Lankford, Séamus   +2 more
openaire   +2 more sources

Corpulyzer: A Novel Framework for Building Low Resource Language Corpora

open access: yesIEEE Access, 2021
The rapid proliferation of artificial intelligence has led to the development of sophisticated cutting-edge systems in natural language processing and computational linguistics domains.
Bilal Tahir, Muhammad Amir Mehmood
doaj   +1 more source

Pre-trained transformer-based language models for Sundanese

open access: yesJournal of Big Data, 2022
The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding.
Wilson Wongso   +2 more
doaj   +1 more source

Towards Language Service Creation and Customization for Low-Resource Languages

open access: yesInformation, 2020
The most challenging issue with low-resource languages is the difficulty of obtaining enough language resources. In this paper, we propose a language service framework for low-resource languages that enables the automatic creation and customization of ...
Donghui Lin, Yohei Murakami, Toru Ishida
doaj   +1 more source

Data Augmentation for Low-Resource Neural Machine Translation [PDF]

open access: yes, 2017
The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we
Bisazza, Arianna   +2 more
core   +2 more sources

Home - About - Disclaimer - Privacy