The LiLa Lemma Bank: A Knowledge Base of Latin Canonical Forms
The dataset contains a list of 215,102 Latin dictionary forms (known as canonical forms or lemmas). The dataset is a set of 1,699,687 Resource Description Framework (RDF) triples that describe, using a series of Web Ontology Language (OWL) ontologies for
Francesco Mambrini +1 more
doaj +1 more source
The regularization of Old English weak verbs
This article deals with the regularization of non-standard spellings of the verbal forms extracted from a corpus. It addresses the question of what the limits of regularization are when lemmatizing Old English weak verbs.
Marta Tío Sáenz
doaj +1 more source
Annif Analyzer Shootout: Comparing text lemmatization methods for automated subject indexing
Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification.
Osma Suominen, Ilkka Koskenniemi
doaj
Lemmatization and parsing with TACT preprocessing programs
None
R G Siemens
doaj +1 more source
BanglaLem: A Transformer-based Bangla Lemmatizer with an Enhanced Dataset
Lemmatization plays a crucial role in various natural language processing (NLP) tasks, such as information retrieval, sentiment analysis, text summarization, and text classification.
Md Fuadul Islam +4 more
doaj +1 more source
Text Stemming and Lemmatization of Regional Languages in Indonesia: A Systematic Literature Review
Background: Stemming is significantly essential in natural language processing (NLP) due to the ability to minimize word variations to fundamental forms. This procedure facilitates the analysis of textual data and enhances the precision of classification
Zaenal Abidin, Akmal Junaidi, Wamiliana
doaj +1 more source
Editing Middle English Medical Manuscripts : The Case of Glasgow University Library MS Hunter 509
It has been pointed out that the editing of a scientific treatise should be “an extended and challenging exercise in judgment, requiring an earnest commitment to scholarship” (Keiser 1998: 110).
María Laura Esteban-Segura
doaj +1 more source
Transformer-based part-of-speech tagging and lemmatization for Latin [PDF]
The paper presents a submission to the EvaLatin 2022 shared task. Our system places first for lemmatization, part-of-speech and morphological tagging in both closed and open modalities.
Wróbel, Krzysztof, Nowak, Krzysztof
core
The lemmatization of Old English Verbs from the second weak class on a lexical database
This article compiles a list of lemmas of the second class weak verbs of Old English by using the latest version of the lexical database Nerthus, which incorporates the texts of the Dictionary of Old English Corpus.
Marta Tío Sáenz
doaj +1 more source
Processing Tools for Greek and Other Languages of the Christian Middle East [PDF]
This paper presents some computer tools and linguistic resources of the GREgORI project. These developments allow automated processing of texts written in the main languages of the Christian Middel East, such as Greek, Arabic, Syriac, Armenian and ...
Bastien Kindt
doaj +1 more source

