PIMA: Parameter-Shared Intelligent Media Analytics Framework for Low Resource Languages
Media analysis (MA) is an evolving area of research in the field of text mining and an important research area for intelligent media analytics. The fundamental purpose of MA is to obtain valuable insights that help to improve many different areas of ...
Dimitrios Zaikis +2 more
doaj +1 more source
Low-Resource Machine Translation Training Curriculum Fit for Low-Resource Languages
We conduct an empirical study of neural machine translation (NMT) for truly low-resource languages, and propose a training curriculum fit for cases when both parallel training data and compute resource are lacking, reflecting the reality of most of the world's languages and the researchers working on these languages. Previously, unsupervised NMT, which
Garry Kuwanto +5 more
openaire +2 more sources
Multilingual Neural Machine Translation for Low-Resource Languages
In recent years, Neural Machine Translation (NMT) has been shown to be more effective than phrase-based statistical methods, thus quickly becoming the state of the art in machine translation (MT).
Surafel M. Lakew +3 more
doaj +1 more source
Clustering of Monolingual Embedding Spaces
Suboptimal performance of cross-lingual word embeddings for distant and low-resource languages calls into question the isomorphic assumption integral to the mapping-based methods of obtaining such embeddings.
Kowshik Bhowmik, Anca Ralescu
doaj +1 more source
adaptMLLM: Fine-Tuning Multilingual Language Models on Low-Resource Languages with Integrated LLM Playgrounds [PDF]
The advent of Multilingual Language Models (MLLMs) and Large Language Models (LLMs) has spawned innovation in many areas of natural language processing.
Séamus Lankford, Haithem Afli, Andy Way
semanticscholar +1 more source
Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation [PDF]
Benefiting from transformer-based pre-trained language models, neural ranking models have made significant progress. More recently, the advent of multilingual pre-trained language models provides great support for designing neural cross-lingual retrieval
Zhiqi Huang
semanticscholar +1 more source
A Manual for Web Corpus Crawling of Low Resource Languages
Since the seminal publication of “Web as Corpus” [1], the potential of creating corpora from the web has been realized for good for the creation of both online and offline corpora: noisy vs. clean, balanced vs. convenient, annotated vs.
Armin Hoenen +2 more
doaj +1 more source
Progress of the PRINCIPLE project: promoting MT for Croatian, Icelandic, Irish and Norwegian [PDF]
This paper updates the progress made on the PRINCIPLE project, a 2-year action funded by the European Commission un-der the Connecting Europe Facility (CEF) programme.
Bago, Petra +10 more
core
Insights into Low-Resource Language Modelling: Improving Model Performances for South African Languages [PDF]
To address the gap in natural language processing for Southern African languages, our paper presents an in-depth analysis of language model development under resource-constrained conditions.
Ruan Visser +2 more
doaj +3 more sources
Misinformation Detection: A Review for High and Low-Resource Languages
The rapid spread of misinformation on platforms like Twitter, and Facebook, and in news headlines highlights the urgent need for effective ways to detect it.
Seani Rananga +3 more
doaj +1 more source

