Results 1 to 10 of about 4,321 (229)
Kurdish Kurmanji Lemmatization and Spell-checker with Spell-correction
There are many studies about using lemmatization and spell-checker with spell-correction regarding English, Arabic, and Persian languages but only few studies found regarding low-resource languages such as Kurdish language and more specifically for ...
Hanar Hoshyar Mustafa, Rebwar M. Nabi
doaj +3 more sources
Hybrid lemmatization in HuSpaCy [PDF]
Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted rules.
Péter Berkecz +4 more
core +4 more sources
Towards an Optimal Solution to Lemmatization in Arabic
Abstract Lemmatization—computing the canonical forms of words in running text—is an important component in any NLP system and a key preprocessing step for most applications that rely on natural language understanding. In the case of Arabic, lemmatization is a complex task because of the rich morphology, agglutinative aspects, and lexical ambiguity ...
Abed Alhakim Freihat +2 more
exaly +4 more sources
The Lemmatization of Copulatives in Northern Sotho *
<p>Abstract: For learners of Northern Sotho as a second or even foreign language, the copulative system is probably the most complicated grammatical system to master. The encoding needs of such learners, i.e.
D.J. Prinsloo
doaj +4 more sources
Korpus DIA1900: jeho koncepce a vytváření [PDF]
The objective of the paper is to describe the principles for building the onemillionword DIA1900 Corpus consisting of Czech texts published between 1851 and 1900, designed to be both balanced and representative.
Lucie Benešová +4 more
doaj +1 more source
Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks [PDF]
AbstractIn this paper, we present a novel lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. In the proposed method, our context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a ...
Jenna Kanerva +2 more
openaire +2 more sources
Advances in the automatic lemmatization of Old English: class V strong verbs (L-Y)
The grammatical description of Old English lacks complete and systematic lemmatization, which hinders Natural Language Processing studies in this language, as they strongly rely on the existence of large, annotated corpora.
Roberto Torre Alonso
doaj +1 more source
BaNeL: an encoder-decoder based Bangla neural lemmatizer
This study presents an efficient framework of deriving lemma from an inflected Bangla word considering its parts-of-speech as context. Bangla is a morphologically rich Indo-Aryan language where around 70% words are inflected, and some words have around ...
Md. Ashraful Islam +4 more
doaj +1 more source
An alternative proposal for eliciting key words [PDF]
The article reports research on the concept of key words as statistically significant items in a text or corpus. It reviews approaches to eliciting key words used in various software products for language analysis and the rationale for adopting them ...
Elena Tarasheva
doaj +1 more source
Enhancing Accuracy of Semantic Relatedness Measurement by Word Single-Meaning Embeddings
We propose a lightweight algorithm of learning word single-meaning embeddings (WSME), by exploring WordNet synsets and Doc2vec document embeddings, to enhance the accuracy of semantic relatedness measurement.
Xiaotao Li, Shujuan You, Wai Chen
doaj +1 more source

