Improving Similar Language Translation With Transfer Learning [PDF]
We investigate transfer learning based on pre-trained neural machine translation models to translate between (low-resource) similar languages. This work is part of our contribution to the WMT 2021 Similar Languages Translation Shared Task where we submitted models for different language pairs, including French-Bambara, Spanish-Catalan, and Spanish ...
arxiv
Language Variety Identification with True Labels [PDF]
Language identification is an important first step in many IR and NLP applications. Most publicly available language identification datasets, however, are compiled under the assumption that the gold label of each instance is determined by where texts are retrieved from.
arxiv
Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation [PDF]
This paper evaluates the different tasks carried out in the translation of pronominal anaphora in a machine translation (MT) system. The MT interlingua approach named AGIR (Anaphora Generation with an Interlingua Representation) improves upon other proposals presented to date because it is able to translate intersentential anaphors, detect co-reference
arxiv +1 more source
Lexical Simplification Benchmarks for English, Portuguese, and Spanish [PDF]
Even in highly-developed countries, as many as 15-30\% of the population can only understand texts written using a basic vocabulary. Their understanding of everyday texts is limited, which prevents them from taking an active role in society and making informed decisions regarding healthcare, legal representation, or democratic choice.
arxiv
EventGraph at CASE 2021 Task 1: A General Graph-based Approach to Protest Event Extraction [PDF]
This paper presents our submission to the 2022 edition of the CASE 2021 shared task 1, subtask 4. The EventGraph system adapts an end-to-end, graph-based semantic parser to the task of Protest Event Extraction and more specifically subtask 4 on event trigger and argument extraction.
arxiv
Systematic review of development literature from Latin America between 2010- 2021 [PDF]
The purpose of this systematic review is to identify and describe the state of development literature published in Latin America, in Spanish and English, since 2010. For this, we carried out a topographic review of 44 articles available in the most important bibliographic indexes of Latin America, published in journals of diverse disciplines.
arxiv
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge [PDF]
This paper proposes a novel lip reading framework, especially for low-resource languages, which has not been well addressed in the previous literature. Since low-resource languages do not have enough video-text paired data to train the model to have sufficient power to model lip movements and language, it is regarded as challenging to develop lip ...
arxiv
Multilingual Email Zoning [PDF]
The segmentation of emails into functional zones (also dubbed email zoning) is a relevant preprocessing step for most NLP tasks that deal with emails. However, despite the multilingual character of emails and their applications, previous literature regarding email zoning corpora and systems was developed essentially for English.
arxiv
Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022 [PDF]
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction.
arxiv
The TALP-UPC System for the WMT Similar Language Task: Statistical vs Neural Machine Translation [PDF]
Although the problem of similar language translation has been an area of research interest for many years, yet it is still far from being solved. In this paper, we study the performance of two popular approaches: statistical and neural. We conclude that both methods yield similar results; however, the performance varies depending on the language pair ...
arxiv +1 more source