Results 121 to 130 of about 4,321 (229)

New resources and lemmatization experiments for Occitan

open access: yes, 2023
This paper presents recent contributions to the creation of NLP tools and resources for Occitan. Several existing resources were modified or adapted, in particular a rule-based tokenizer, a morphosyntactic lexicon and a treebank.
Miletic Haddad, Aleksandra
core  

POS Tagging and Lemmatization of Historical Varieties of Languages. The Challenge of Old Italian

open access: yesIJCoL
The paper discusses the challenges of POS tagging and lemmatization of historical varieties of Italian, and reports for both tasks the results of experiments carried out in a classical supervised domain adaptation scenario using the diachronic and ...
Manuel Favaro   +2 more
doaj   +1 more source

Specyfika staropolszczyzny a anotacja gramatyczna. O lematyzacji tekstu staropolskiego

open access: yesLingVaria
SPECIFIC FEATURES OF OLD POLISH LANGUAGE AND GRAMMATICAL ANNOTATION: LEMMATIZATION OF OLD POLISH TEXTS The article is dedicated to grammatical annotation of Old Polish texts. It discusses the problem with the lemmatization of a medieval text.
Magdalena Wismont   +3 more
doaj   +1 more source

BabyLemmatizer : A Lemmatizer and POS-tagger for Akkadian

open access: yes, 2022
We present a hybrid lemmatizer and POS-tagger for Akkadian, the language of the ancient Assyrians and Babylonians, documented from 2350 BCE to 100 CE. In our approach the text is first POS-tagged and lemmatized with TurkuNLP trained with human-verified labels, and then post-corrected with dictionary-based methods to improve the lemmatization quality ...
Sahala Aleksi   +3 more
openaire   +2 more sources

Current Approaches in Latin Lemmatization

open access: yes, 2020
By providing an overview of the various lemmatization processes and criteria applied in a number of linguistic resources and NLP tools for Latin, this special issue seeks to highlight their differences and commonalities, and points to interoperability as
Passarotti, Marco Carlo
core  

Italian Text Categorization with Lemmatization and Support Vector Machines

open access: yes, 2019
The paper describes an Italian language text categorizer by Lemmatization and support vector machines. The categorizer is composed of six modules. The first module performs the tokenization, removing the punctuation signs; the second and third ones carry
Camastra F.   +3 more
core   +1 more source

Analyzing Multilingual French and Russian Text using NLTK, spaCy, and Stanza

open access: yesThe Programming Historian
This lesson covers tokenization, part-of-speech tagging, and lemmatization, as well as automatic language detection, for non-English and multilingual text.
Ian Goodale
doaj   +1 more source

Enhancing Sumerian Lemmatization by Unsupervised Named-Entity Recognition

open access: yes, 2020
Lemmatization for the Sumerian language, compared to the modern languages, is much more challenging due to that it is a long dead language, highly skilled language experts are extremely scarce and more and more Sumerian texts are coming out.
Yudong Liu   +3 more
core  

Automatic Word Lemmatization

open access: yes, 2002
This record contains a full paper presented at the 3rd Conference on Language Technologies (JT-2002), held in Ljubljana, Slovenia, in October 2002.
openaire   +2 more sources

Home - About - Disclaimer - Privacy