In this paper, a contrastive learning approach for morphological disambiguation (MD) using large language models (LLMs) is presented. A contrastive loss function is introduced for training the approach, which reduces the distance between the correct ...
Gulmira Tolegen +2 more
doaj +4 more sources
Data sets for author name disambiguation: an empirical analysis and a new resource [PDF]
Data sets of publication meta data with manually disambiguated author names play an important role in current author name disambiguation (AND) research. We review the most important data sets used so far, and compare their respective advantages and shortcomings. From the results of this review, we derive a set of general requirements to future AND data
Mark-Christoph Müller +2 more
exaly +6 more sources
Web Resource Sense Disambiguation in Web of Data
JUCS - Journal of Universal Computer Science Volume Nr.
Matinfar,Farzam +2 more
openaire +3 more sources
Hybrid artificial intelligence architectures for automatic text correction in the Kazakh language [PDF]
The Kazakh language, as an agglutinative and morphologically rich language, presents significant challenges for the development of natural language processing (NLP) tools.
Laura Baitenova +4 more
doaj +2 more sources
Leveraging large language models for rare disease named entity recognition. [PDF]
Named Entity Recognition (NER) in the rare disease domain poses unique challenges due to limited labeled data, semantic ambiguity between entity types, and long-tail distributions.
Nan Miles Xi, Yu Deng, Lin Wang
doaj +2 more sources
A comprehensive dataset for Arabic word sense disambiguation [PDF]
This data paper introduces a comprehensive dataset tailored for word sense disambiguation tasks, explicitly focusing on a hundred polysemous words frequently employed in Modern Standard Arabic.
Sanaa Kaddoura, Reem Nassar
doaj +2 more sources
Toponym Extraction and Disambiguation from Text: A Survey
Toponym is an essential element of geospatial information. Traditionally, toponyms are collected in a gazetteer through field surveys that require significant resources, including labor, time, and money.
Rizka Windiastuti +2 more
doaj +3 more sources
AI-Driven Medical Device Risk Management: A New Paradigm Integrating Large Language Models and Prompt Engineering for Standard-Risk Knowledge Graph Construction and Application [PDF]
Wanting Zhu,1 Peiming Zhang,1 Wenke Xia,1 Ziming Gao,2 Weiqi Li,1 Ruixue Tian,3 Li Wang4 1School of Health Science and Engineering, University of Shanghai for Science and Technology, Educational Institution, Shanghai, People’s Republic of China ...
Zhu W +6 more
doaj +2 more sources
Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration. [PDF]
In this paper, we present HeiNER, the multilingual Heidelberg Named Entity Resource. HeiNER contains 1,547,586 disambiguated English Named Entities together with translations and transliterations to 15 languages. Our work builds on the approach described in (Bunescu and Pasca, 2006), yet extends it to a multilingual dimension.
Wentland, Wolodja +3 more
core +6 more sources
Sentiment analysis techniques, challenges, and opportunities: Urdu language-based analytical study [PDF]
Sentiment analysis in research involves the processing and analysis of sentiments from textual data. The sentiment analysis for high resource languages such as English and French has been carried out effectively in the past. However, its applications are
Muhammad Irzam Liaqat +4 more
doaj +2 more sources

