Results 21 to 30 of about 59 (54)

Multilingual Central Repository: a Cross-lingual Framework for Developing Wordnets [PDF]

open access: yesarXiv, 2021
Language resources are necessary for language processing,but building them is costly, involves many researches from different areas and needs constant updating. In this paper, we describe the crosslingual framework used for developing the Multilingual Central Repository (MCR), a multilingual knowledge base that includes wordnets of Basque, Catalan ...
arxiv  

Ditos e refráns recollidos na comarca do Ortegal / Expressions and proverbs collected in the Ortegal area [PDF]

open access: yesCadernos de Fraseoloxía Galega, 2015
Recadádiva de 185 locucións, fórmulas e refráns galegos da comarca do Ortegal (NO de Galicia) que son da acordanza e uso actual do colector, da súa familia e veciños. // Compilation of 185 Galician idioms, expressions and proverbs from the Ortegal area (
Narciso Luaces Pardo
doaj  

milIE: Modular & Iterative Multilingual Open Information Extraction [PDF]

open access: yesarXiv, 2021
Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we explore the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones
arxiv  

SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding [PDF]

open access: yesarXiv, 2022
This paper presents the shared task on Multilingual Idiomaticity Detection and Sentence Embedding, which consists of two subtasks: (a) a binary classification task aimed at identifying whether a sentence contains an idiomatic expression, and (b) a task based on semantic text similarity which requires the model to adequately represent potentially ...
arxiv  

Sample Efficient Approaches for Idiomaticity Detection [PDF]

open access: yesarXiv, 2022
Deep neural models, in particular Transformer-based pre-trained language models, require a significant amount of data to train. This need for data tends to lead to problems when dealing with idiomatic multiword expressions (MWEs), which are inherently less frequent in natural text.
arxiv  

Reducing Confusion in Active Learning for Part-Of-Speech Tagging [PDF]

open access: yesarXiv, 2020
Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances,
arxiv  

Towards Syntactic Iberian Polarity Classification [PDF]

open access: yesarXiv, 2017
Lexicon-based methods using syntactic rules for polarity classification rely on parsers that are dependent on the language and on treebank guidelines. Thus, rules are also dependent and require adaptation, especially in multilingual scenarios. We tackle this challenge in the context of the Iberian Peninsula, releasing the first symbolic syntax-based ...
arxiv  

An Empirical Approach for Modeling Fuzzy Geographical Descriptors [PDF]

open access: yesarXiv, 2017
We present a novel heuristic approach that defines fuzzy geographical descriptors using data gathered from a survey with human subjects. The participants were asked to provide graphical interpretations of the descriptors `north' and `south' for the Galician region (Spain).
arxiv  

A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages [PDF]

open access: yesarXiv, 2019
Parsers are available for only a handful of the world's languages, since they require lots of training data. How far can we get with just a small amount of training data? We systematically compare a set of simple strategies for improving low-resource parsers: data augmentation, which has not been tested before; cross-lingual training; and ...
arxiv  

Surfing the modeling of PoS taggers in low-resource scenarios [PDF]

open access: yesMathematics 2022, 10(19), 3526
The recent trend towards the application of deep structured techniques has revealed the limits of huge models in natural language processing. This has reawakened the interest in traditional machine learning algorithms, which have proved still to be competitive in certain contexts, in particular low-resource settings.
arxiv   +1 more source

Home - About - Disclaimer - Privacy