Results 11 to 20 of about 82,529 (316)

Preregistering NLP research [PDF]

open access: yesProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Accepted at NAACL2021; pre-final draft, comments ...
van Miltenburg, Emiel   +2 more
openaire   +3 more sources

Grounding ‘Grounding’ in NLP [PDF]

open access: yesFindings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021
The NLP community has seen substantial recent interest in grounding to facilitate interaction between language technologies and the world. However, as a community, we use the term broadly to reference any linking of text to data or non-textual modality. In contrast, Cognitive Science more formally defines "grounding" as the process of establishing what
Khyathi Raghavi Chandu   +2 more
openaire   +2 more sources

MasakhaNER: Named Entity Recognition for African Languages

open access: yesTransactions of the Association for Computational Linguistics, 2021
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani   +60 more
doaj   +1 more source

Sub-Character Tokenization for Chinese Pretrained Language Models

open access: yesTransactions of the Association for Computational Linguistics, 2023
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si   +8 more
doaj   +1 more source

Large scale text mining for deriving useful insights: A case study focused on microbiome

open access: yesFrontiers in Physiology, 2022
Text mining has been shown to be an auxiliary but key driver for modeling, data harmonization, and interpretation in bio-medicine. Scientific literature holds a wealth of information and embodies cumulative knowledge and remains the core basis on which ...
Syed Ashif Jardary Al Ahmed   +8 more
doaj   +1 more source

Crowdsourcing for NLP [PDF]

open access: yesProceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Tutorial Abstracts, 2015
Crowdsourced applications to scientific problems is a hot research area, with over 10,000 publications in the past five years. Platforms such as Amazons Mechanical Turk and CrowdFlower provide researchers with easy access to large numbers of workers.
Chris Callison-Burch   +2 more
openaire   +1 more source

Extrapolation in NLP [PDF]

open access: yesProceedings of the Workshop on Generalization in the Age of Deep Learning, 2018
We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.
Mitchell, Jeff   +3 more
openaire   +2 more sources

Lexical and Morphological Statistics of an Arabic POS-Tagged Corpus [PDF]

open access: yesThe Egyptian Journal of Language Engineering, 2014
Part-Of-Speech (POS) tagging is a basic component necessary for many Natural Language Processing (NLP) applications. Building a manually tagged corpus helps in studying key statistics of a given language which form the basis for POS tagging systems.
Hamdy Mubarak   +2 more
doaj   +1 more source

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

open access: yesMathematics, 2022
The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type high esteem, strong tea, run [an] experiment, war break(s) out, etc.
Alexander Shvets, Leo Wanner
doaj   +1 more source

Word Sense Induction with Attentive Context Clustering [PDF]

open access: yesJournal of Data Mining and Digital Humanities, 2022
This paper presents ACCWSI (Attentive Context Clustering WSI), a method for Word Sense Induction, suitable for languages with limited resources. Pretrained on a small corpus and given an ambiguous word (a query word) and a set of excerpts that contain it,
Moshe Stekel, Amos Azaria, Shai Gordin
doaj   +1 more source

Home - About - Disclaimer - Privacy