Results 121 to 130 of about 4,321 (229)
New resources and lemmatization experiments for Occitan
This paper presents recent contributions to the creation of NLP tools and resources for Occitan. Several existing resources were modified or adapted, in particular a rule-based tokenizer, a morphosyntactic lexicon and a treebank.
Miletic Haddad, Aleksandra
core
POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian
The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and ...
Manuel Favaro +2 more
doaj +1 more source
Learning to Lemmatize in the Word Representation Space.
Peer ...
Lagus, Jarkko, Klami, Arto
openaire +4 more sources
Specyfika staropolszczyzny a anotacja gramatyczna. O lematyzacji tekstu staropolskiego
SPECIFIC FEATURES OF OLD POLISH LANGUAGE AND GRAMMATICAL ANNOTATION: LEMMATIZATION OF OLD POLISH TEXTS The article is dedicated to grammatical annotation of Old Polish texts. It discusses the problem with the lemmatization of a medieval text.
Magdalena Wismont +3 more
doaj +1 more source
BabyLemmatizer : A Lemmatizer and POS-tagger for Akkadian
We present a hybrid lemmatizer and POS-tagger for Akkadian, the language of the ancient Assyrians and Babylonians, documented from 2350 BCE to 100 CE. In our approach the text is first POS-tagged and lemmatized with TurkuNLP trained with human-verified labels, and then post-corrected with dictionary-based methods to improve the lemmatization quality ...
Sahala Aleksi +3 more
openaire +2 more sources
Current Approaches in Latin Lemmatization
By providing an overview of the various lemmatization processes and criteria applied in a number of linguistic resources and NLP tools for Latin, this special issue seeks to highlight their differences and commonalities, and points to interoperability as
Passarotti, Marco Carlo
core
Italian Text Categorization with Lemmatization and Support Vector Machines
The paper describes an Italian language text categorizer by Lemmatization and support vector machines. The categorizer is composed of six modules. The first module performs the tokenization, removing the punctuation signs; the second and third ones carry
Camastra F. +3 more
core +1 more source
Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text.
Ian Goodale
doaj +1 more source
Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition
Lemmatization for the Sumerian language, compared to the modern languages, is much more challenging due to that it is a long dead language, highly skilled language experts are extremely scarce and more and more Sumerian texts are coming out.
Yudong Liu +3 more
core
This record contains a full paper presented at the 3rd Conference on Language Technologies (JT-2002), held in Ljubljana, Slovenia, in October 2002.
openaire +2 more sources

