Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets [PDF]
Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for developing the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan ...
arxiv
Ditos e refráns recollidos na comarca do Ortegal / Expressions and proverbs collected in the Ortegal area [PDF]
Recadádiva de 185 locucións, fórmulas e refráns galegos da comarca do Ortegal (NO de Galicia) que son da acordanza e uso actual do colector, da súa familia e veciños. // Compilation of 185 Galician idioms, expressions and proverbs from the Ortegal area (
Narciso Luaces Pardo
doaj
milIE: Modular & Iterative Multilingual Open Information Extraction [PDF]
Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones
arxiv
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding [PDF]
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially ...
arxiv
Sample Efficient Approaches for Idiomaticity Detection [PDF]
Deep neural models, in particular Transformer-based pre-trained language models, require a significant amount of data to train. This need for data tends to lead to problems when dealing with idiomatic multiword expressions (MWEs), which are inherently less frequent in natural text.
arxiv
Reducing Confusion in Active Learning for Part-Of-Speech Tagging [PDF]
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances,
arxiv
Towards Syntactic Iberian Polarity Classification [PDF]
Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines. Thus, rules are also dependent and require adaptation, especially in multilingual scenarios. We tackle this challenge in the context of the Iberian Peninsula, releasing the first symbolic syntax-based ...
arxiv
An Empirical Approach for Modeling Fuzzy Geographical Descriptors [PDF]
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain).
arxiv
A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages [PDF]
Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and ...
arxiv
Surfing the modeling of PoS taggers in low-resource scenarios [PDF]
The recent trend towards the application of deep structured techniques has revealed the limits of huge models in natural language processing. This has reawakened the interest in traditional machine learning algorithms, which have proved still to be competitive in certain contexts, in particular low-resource settings.
arxiv +1 more source