Lemmatization - Open Access .click

Results 121 to 130 of about 4,321 (229)

New resources and lemmatization experiments for Occitan

, 2023
This paper presents recent contributions to the creation of NLP tools and resources for Occitan. Several existing resources were modified or adapted, in particular a rule-based tokenizer, a morphosyntactic lexicon and a treebank.
Miletic Haddad, Aleksandra
core

POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian

IJCoL
The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and ...
Manuel Favaro, Marco Biffi, Simonetta Montemagni +2 more
doaj +1 more source

Learning to Lemmatize in the Word Representation Space.

, 2022
Peer ...
Lagus, Jarkko, Klami, Arto
openaire +4 more sources

Specyfika staropolszczyzny a anotacja gramatyczna. O lematyzacji tekstu staropolskiego

LingVaria
SPECIFIC FEATURES OF OLD POLISH LANGUAGE AND GRAMMATICAL ANNOTATION: LEMMATIZATION OF OLD POLISH TEXTS The article is dedicated to grammatical annotation of Old Polish texts. It discusses the problem with the lemmatization of a medieval text.
Magdalena Wismont +3 more
doaj +1 more source

BabyLemmatizer : A Lemmatizer and POS-tagger for Akkadian

, 2022
We present a hybrid lemmatizer and POS-tagger for Akkadian, the language of the ancient Assyrians and Babylonians, documented from 2350 BCE to 100 CE. In our approach the text is first POS-tagged and lemmatized with TurkuNLP trained with human-verified labels, and then post-corrected with dictionary-based methods to improve the lemmatization quality ...
Sahala Aleksi +3 more
openaire +2 more sources

Current Approaches in Latin Lemmatization

, 2020
By providing an overview of the various lemmatization processes and criteria applied in a number of linguistic resources and NLP tools for Latin, this special issue seeks to highlight their differences and commonalities, and points to interoperability as
Passarotti, Marco Carlo
core

Italian Text Categorization with Lemmatization and Support Vector Machines

, 2019
The paper describes an Italian language text categorizer by Lemmatization and support vector machines. The categorizer is composed of six modules. The first module performs the tokenization, removing the punctuation signs; the second and third ones carry
Camastra F. +3 more
core +1 more source

Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza

The Programming Historian
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text.
Ian Goodale
doaj +1 more source

Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

, 2020
Lemmatization for the Sumerian language, compared to the modern languages, is much more challenging due to that it is a long dead language, highly skilled language experts are extremely scarce and more and more Sumerian texts are coming out.
Yudong Liu +3 more
core

Automatic Word Lemmatization

, 2002
This record contains a full paper presented at the 3rd Conference on Language Technologies (JT-2002), held in Ljubljana, Slovenia, in October 2002.
openaire +2 more sources

computation and language cs.cl
fos: computer and information sciences
computer science - computation and language

natural language processing
4. education
lexicography

corpora
old english
electronic dictionary