Results 1 to 10 of about 51,982 (249)

MasakhaNER: Named Entity Recognition for African Languages

open access: yesTransactions of the Association for Computational Linguistics, 2021
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani   +60 more
doaj   +1 more source

Sub-Character Tokenization for Chinese Pretrained Language Models

open access: yesTransactions of the Association for Computational Linguistics, 2023
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si   +8 more
doaj   +1 more source

Large scale text mining for deriving useful insights: A case study focused on microbiome

open access: yesFrontiers in Physiology, 2022
Text mining has been shown to be an auxiliary but key driver for modeling, data harmonization, and interpretation in bio-medicine. Scientific literature holds a wealth of information and embodies cumulative knowledge and remains the core basis on which ...
Syed Ashif Jardary Al Ahmed   +8 more
doaj   +1 more source

Word Sense Induction with Attentive Context Clustering [PDF]

open access: yesJournal of Data Mining and Digital Humanities, 2022
This paper presents ACCWSI (Attentive Context Clustering WSI), a method for Word Sense Induction, suitable for languages with limited resources. Pretrained on a small corpus and given an ambiguous word (a query word) and a set of excerpts that contain it,
Moshe Stekel, Amos Azaria, Shai Gordin
doaj   +1 more source

The Relation Dimension in the Identification and Classification of Lexically Restricted Word Co-Occurrences in Text Corpora

open access: yesMathematics, 2022
The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type high esteem, strong tea, run [an] experiment, war break(s) out, etc.
Alexander Shvets, Leo Wanner
doaj   +1 more source

HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

open access: yesBMC Bioinformatics, 2022
Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic ...
Juan J. Lastra-Díaz   +2 more
doaj   +1 more source

UPRec: User-aware Pre-training for sequential Recommendation

open access: yesAI Open, 2023
Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior ...
Chaojun Xiao   +6 more
doaj   +1 more source

Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data

open access: yesIEEE Access, 2020
During the last decade, hateful and sexist content towards women is being increasingly spread on social networks. The exposure to sexist speech has serious consequences to women's life and limits their freedom of speech.
Francisco Rodriguez-Sanchez   +2 more
doaj   +1 more source

Classification of Nail Abnormalities using Convolutional Neural Network

open access: yesวารสารวิทยาการสารสนเทศและเทคโนโลยีประยุกต์, 2023
Nails are one organ which can indicate the status of a health condition through its own appearance. To create a model which can be applied as a tool for self-classifying nail abnormalities, this article presents the study and analysis on seven ...
Nuttida Lapthanachai   +3 more
doaj   +1 more source

Natural Language Processing and Language Technologies for the Basque Language

open access: yesCuadernos Europeos de Deusto, 2022
The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years.
Itziar Gonzalez-Dios, Begoña Altuna
doaj   +1 more source

Home - About - Disclaimer - Privacy