ChatGPT Perpetuates Gender Bias in Machine Translation and Ignores Non-Gendered Pronouns: Findings across Bengali and Five other Low-Resource Languages [PDF]
In this multicultural age, language translation is one of the most performed tasks, and it is becoming increasingly AI-moderated and automated. As a novel AI system, ChatGPT claims to be proficient in machine translation tasks and in this paper, we put ...
Sourojit Ghosh, Aylin Caliskan
semanticscholar +1 more source
Extending Multilingual BERT to Low-Resource Languages [PDF]
Multilingual BERT (M-BERT) has been a huge success in both supervised and zero-shot cross-lingual transfer learning. However, this success has focused only on the top 104 languages in Wikipedia that it was trained on. In this paper, we propose a simple but effective approach to extend M-BERT (E-BERT) so that it can benefit any new language, and show ...
Zihan Wang 0001 +3 more
openaire +2 more sources
ChatGPT MT: Competitive for High- (but Not Low-) Resource Languages [PDF]
Large language models (LLMs) implicitly learn to perform a range of language tasks, including machine translation (MT). Previous studies explore aspects of LLMs’ MT capabilities.
N. R. Robinson +3 more
semanticscholar +1 more source
Towards a Sentiment Analyser for Low-resource Languages [PDF]
Twitter is one of the top influenced social media which has a million number of active users. It is commonly used for microblogging that allows users to share messages, ideas, thoughts and many more. Thus, millions interaction such as short messages or tweets are flowing around among the twitter users discussing various topics that has been happening ...
Dian Indriani +3 more
openaire +2 more sources
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts [PDF]
Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary ...
Xuan-Phi Nguyen +3 more
semanticscholar +1 more source
Low-Resource Language Modelling of South African Languages
Language models are the foundation of current neural network-based models for natural language understanding and generation. However, research on the intrinsic performance of language models on African languages has been extremely limited, which is made more challenging by the lack of large or standardised training and evaluation sets that exist for ...
Stuart Mesham +3 more
openaire +2 more sources
Machine Translation into Low-resource Language Varieties [PDF]
State-of-the-art machine translation (MT) systems are typically trained to generate the "standard" target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language.
Sachin Kumar 0009 +3 more
openaire +2 more sources
Bitext Mining Using Distilled Sentence Representations for Low-Resource Languages [PDF]
Scaling multilingual representation learning beyond the hundred most frequent languages is challenging, in particular to cover the long tail of low-resource languages.
Kevin Heffernan +2 more
semanticscholar +1 more source
Urdu is still considered a low-resource language despite being ranked as world’s $10^{th}$ most spoken language with nearly 230 million speakers.
Abdul Ghafoor +6 more
doaj +1 more source
Bayesian Models for Unit Discovery on a Very Low Resource Language [PDF]
Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work
Besacier, Laurent +9 more
core +3 more sources

