Results 1 to 10 of about 51,982 (249)
MasakhaNER: Named Entity Recognition for African Languages
We take a step towards addressing the under- representation of the African continent in NLP research by bringing together different stakeholders to create the first large, publicly available, high-quality dataset for named entity recognition (NER) in ten
David Ifeoluwa Adelani +60 more
doaj +1 more source
Sub-Character Tokenization for Chinese Pretrained Language Models
Tokenization is fundamental to pretrained language models (PLMs). Existing tokenization methods for Chinese PLMs typically treat each character as an indivisible token.
Chenglei Si +8 more
doaj +1 more source
Large scale text mining for deriving useful insights: A case study focused on microbiome
Text mining has been shown to be an auxiliary but key driver for modeling, data harmonization, and interpretation in bio-medicine. Scientific literature holds a wealth of information and embodies cumulative knowledge and remains the core basis on which ...
Syed Ashif Jardary Al Ahmed +8 more
doaj +1 more source
Word Sense Induction with Attentive Context Clustering [PDF]
This paper presents ACCWSI (Attentive Context Clustering WSI), a method for Word Sense Induction, suitable for languages with limited resources. Pretrained on a small corpus and given an ambiguous word (a query word) and a set of excerpts that contain it,
Moshe Stekel, Amos Azaria, Shai Gordin
doaj +1 more source
The speech of native speakers is full of idiosyncrasies. Especially prominent are lexically restricted binary word co-occurrences of the type high esteem, strong tea, run [an] experiment, war break(s) out, etc.
Alexander Shvets, Leo Wanner
doaj +1 more source
HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey
Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic ...
Juan J. Lastra-Díaz +2 more
doaj +1 more source
UPRec: User-aware Pre-training for sequential Recommendation
Recent years witness the success of pre-trained models to alleviate the data sparsity problem in recommender systems. However, existing pre-trained models for recommendation mainly focus on leveraging universal sequence patterns from user behavior ...
Chaojun Xiao +6 more
doaj +1 more source
Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data
During the last decade, hateful and sexist content towards women is being increasingly spread on social networks. The exposure to sexist speech has serious consequences to women's life and limits their freedom of speech.
Francisco Rodriguez-Sanchez +2 more
doaj +1 more source
Classification of Nail Abnormalities using Convolutional Neural Network
Nails are one organ which can indicate the status of a health condition through its own appearance. To create a model which can be applied as a tool for self-classifying nail abnormalities, this article presents the study and analysis on seven ...
Nuttida Lapthanachai +3 more
doaj +1 more source
Natural Language Processing and Language Technologies for the Basque Language
The presence of a language in the digital domain is crucial for its survival, as online communication and digital language resources have become the standard in the last decades and will gain more importance in the coming years.
Itziar Gonzalez-Dios, Begoña Altuna
doaj +1 more source

