Results 1 to 10 of about 82,066 (316)

MasakhaNER: Named Entity Recognition for African Languages

open access: yesTransactions of the Association for Computational Linguistics, 2021
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani   +60 more
doaj   +1 more source

Sub-Character Tokenization for Chinese Pretrained Language Models

open access: yesTransactions of the Association for Computational Linguistics, 2023
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si   +8 more
doaj   +1 more source

Large scale text mining for deriving useful insights: A case study focused on microbiome

open access: yesFrontiers in Physiology, 2022
Text mining has been shown to be an auxiliary but key driver for modeling, data harmonization, and interpretation in bio-medicine. Scientific literature holds a wealth of information and embodies cumulative knowledge and remains the core basis on which ...
Syed Ashif Jardary Al Ahmed   +8 more
doaj   +1 more source

Lexical and Morphological Statistics of an Arabic POS-Tagged Corpus [PDF]

open access: yesThe Egyptian Journal of Language Engineering, 2014
Part-Of-Speech (POS) tagging is a basic component necessary for many Natural Language Processing (NLP) applications. Building a manually tagged corpus helps in studying key statistics of a given language which form the basis for POS tagging systems.
Hamdy Mubarak   +2 more
doaj   +1 more source

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

open access: yesMathematics, 2022
The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type high esteem, strong tea, run [an] experiment, war break(s) out, etc.
Alexander Shvets, Leo Wanner
doaj   +1 more source

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

open access: yesBMC Bioinformatics, 2022
Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic ...
Juan J. Lastra-Díaz   +2 more
doaj   +1 more source

UPRec: User-aware Pre-training for sequential Recommendation

open access: yesAI Open, 2023
Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior ...
Chaojun Xiao   +6 more
doaj   +1 more source

Word Sense Induction with Attentive Context Clustering [PDF]

open access: yesJournal of Data Mining and Digital Humanities, 2022
This paper presents ACCWSI (Attentive Context Clustering WSI), a method for Word Sense Induction, suitable for languages with limited resources. Pretrained on a small corpus and given an ambiguous word (a query word) and a set of excerpts that contain it,
Moshe Stekel, Amos Azaria, Shai Gordin
doaj   +1 more source

Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data

open access: yesIEEE Access, 2020
During the last decade, hateful and sexist content towards women is being increasingly spread on social networks. The exposure to sexist speech has serious consequences to women's life and limits their freedom of speech.
Francisco Rodriguez-Sanchez   +2 more
doaj   +1 more source

Kumparan NLP Library

open access: yes, 2021
Kumparan's NLP ...
Aldiyansyah, Bayu   +4 more
core   +1 more source

Home - About - Disclaimer - Privacy