Results 21 to 30 of about 2,678,709 (132)
Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification [PDF]
We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in ...
arxiv
Automatic Discourse Segmentation: Review and Perspectives [PDF]
Multilingual discourse parsing is a very prominent research topic. The first stage for discourse parsing is discourse segmentation. The study reported in this article addresses a review of two on-line available discourse segmenters (for English and Portuguese).
arxiv
Unsupervised Lexical Simplification with Context Augmentation [PDF]
We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data.
arxiv
SemEval 2023 Task 9: Multilingual Tweet Intimacy Analysis [PDF]
We propose MINT, a new Multilingual INTimacy analysis dataset covering 13,372 tweets in 10 languages including English, French, Spanish, Italian, Portuguese, Korean, Dutch, Chinese, Hindi, and Arabic. We benchmarked a list of popular multilingual pre-trained language models. The dataset is released along with the SemEval 2023 Task 9: Multilingual Tweet
arxiv
European Longitude Prizes. I. Longitude Determination in the Spanish Empire [PDF]
Following Columbus' voyages to the Americas, Castilian (Spanish) and Portuguese rulers engaged in heated geopolitical competition, which was eventually reconciled through a number of treaties that divided the world into two unequal hemispheres. However, the early-sixteenth-century papal demarcation line was poorly defined.
arxiv
On measuring linguistic intelligence [PDF]
This work addresses the problem of measuring how many languages a person "effectively" speaks given that some of the languages are close to each other. In other words, to assign a meaningful number to her language portfolio. Intuition says that someone who speaks fluently Spanish and Portuguese is linguistically less proficient compared to someone who ...
arxiv
A Portuguese Native Language Identification Dataset [PDF]
In this paper we present NLI-PT, the first Portuguese dataset compiled for Native Language Identification (NLI), the task of identifying an author's first language based on their second language writing. The dataset includes 1,868 student essays written by learners of European Portuguese, native speakers of the following L1s: Chinese, English, Spanish,
arxiv
BVS Corpus: A Multilingual Parallel Corpus of Biomedical Scientific Texts [PDF]
The BVS database (Health Virtual Library) is a centralized source of biomedical information for Latin America and Carib, created in 1998 and coordinated by BIREME (Biblioteca Regional de Medicina) in agreement with the Pan American Health Organization (OPAS).
arxiv
Scientific literature cited in patents: A Technology Transfer indicator in Portuguese universities [PDF]
The study aims to identify the process of transfer from science to technology that occurs in the main Portuguese public universities. The methodology was based on the analysis of the scientific literature cited in patents. Data was obtained from the Lens patent database. 10,514 scientific articles cited in patents were retrieved. A descriptive analysis
arxiv
UDS--DFKI Submission to the WMT2019 Similar Language Translation Shared Task [PDF]
In this paper we present the UDS-DFKI system submitted to the Similar Language Translation shared task at WMT 2019. The first edition of this shared task featured data from three pairs of similar languages: Czech and Polish, Hindi and Nepali, and Portuguese and Spanish.
arxiv