NADI 2023: The Fourth Nuanced Arabic Dialect Identification Shared Task
ARABICNLP, 2023We describe the findings of the fourth Nuanced Arabic Dialect Identification Shared Task (NADI 2023). The objective of NADI is to help advance state-of-the-art Arabic NLP by creating opportunities for teams of researchers to collaboratively compete under
M. Abdul-Mageed +5 more
semanticscholar +1 more source
Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
ARABICNLP, 2023Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018.
Amr Keleg, Walid Magdy
semanticscholar +1 more source
Lisan: Yemeni, Iraqi, Libyan, and Sudanese Arabic Dialect Corpora with Morphological Annotations
ACS/IEEE International Conference on Computer Systems and Applications, 2022This article presents morphologically-annotated Yemeni, Sudanese, Iraqi, and Libyan Arabic dialects (${\text{L}\hat{\text{i}}\text{sa}\bar{\text{n}}}$) corpora. ${\text{L}\hat{\text{i}}\text{sa}\bar{\text{n}}}$ features around 1.2 million tokens.
Mustafa Jarrar +4 more
semanticscholar +1 more source
Arabic Dialect Identification with a Few Labeled Examples Using Generative Adversarial Networks
AACL, 2022Given the challenges and complexities introduced while dealing with Dialect Arabic (DA) variations, Transformer based models, e.g., BERT, outperformed other models in dealing with the DA identification task.
Mahmoud Yusuf +2 more
semanticscholar +1 more source
The functions of the verb ‘to say’ in the Jordanian Arabic dialect of Irbid
Poznan Studies in Contemporary Linguistics, 2021This research investigates the functions of the verb ‘to say’ in the Jordanian Arabic dialect of Irbid (JADI). Relying on a 250,000-word corpus, we propose that the speech verb ‘to say’ in JADI has one main lexical function (i.e.
Ekab Al-shawashreh +2 more
semanticscholar +1 more source
SudaBERT: A Pre-trained Encoder Representation For Sudanese Arabic Dialect
2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 2021Bidirectional Encoder Representations from Transformers (BERT) has proven to be very efficient at Natural Language Understanding (NLU), as it allows to achieve state-of-the-art results in most NLU tasks.
Mukhtar Elgezouli +2 more
semanticscholar +1 more source
NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
ARABICNLPWe describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI’s objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that ...
M. Abdul-Mageed +7 more
semanticscholar +1 more source
ADI17: A Fine-Grained Arabic Dialect Identification Dataset
IEEE International Conference on Acoustics, Speech, and Signal Processing, 2020In this paper, we describe a method to collect dialectal speech from YouTube videos to create a large-scale Dialect Identification (DID) dataset. Using this method, we collected dialectal Arabic from known YouTube channels from 17 Arabic speaking ...
Suwon Shon +4 more
semanticscholar +1 more source
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect
COLING WorkshopsWe introduce Atlas-Chat, the first-ever collection of LLMs specifically developed for dialectal Arabic. Focusing on Moroccan Arabic, also known as Darija, we construct our instruction dataset by consolidating existing Darija language resources, creating ...
Guokan Shang +11 more
semanticscholar +1 more source
Spoken Arabic dialect recognition using X-vectors
Natural Language Engineering, 2020This paper describes our automatic dialect identification system for recognizing four major Arabic dialects, as well as Modern Standard Arabic. We adapted the X-vector framework, which was originally developed for speaker recognition, to the task of ...
Abualsoud Hanani, Rabee Naser
semanticscholar +1 more source

