Results 11 to 20 of about 14,660 (202)
Development of a Hindi Lemmatizer [PDF]
We live in a translingual society, in order to communicate with people from different parts of the world we need to have an expertise in their respective languages.
Joshi, Nisheeth +2 more
core +3 more sources
Simple data-driven context-sensitive lemmatization [PDF]
Lemmatization for languages with rich inflectional morphology is one of the basic, indispensable steps in a language processing pipeline. In this paper we present a simple data-driven context-sensitive approach to lemmatizating word forms in running text.
Chrupała, Grzegorz
core +4 more sources
Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks [PDF]
AbstractIn this paper, we present a novel lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. In the proposed method, our context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a ...
Jenna Kanerva +2 more
openaire +2 more sources
Lemmatization of Polish person names [PDF]
The paper presents two techniques for lemmatization of Polish person names. First, we apply a rule-based approach which relies on linguistic information and heuristics. Then, we investigate an alternative knowledge-poor method which employs string distance measures. We provide an evaluation of the adopted techniques using a set of newspaper texts.
Piskorski, Jakub +2 more
openaire +2 more sources
Translating Speech to Indian Sign Language Using Natural Language Processing
Language plays a vital role in the communication of ideas, thoughts, and information to others. Hearing-impaired people also understand our thoughts using a language known as sign language.
Purushottam Sharma +4 more
doaj +1 more source
Hybrid lemmatization in HuSpaCy
Lemmatization is still not a trivial task for morphologically rich languages. Previous studies showed that hybrid architectures usually work better for these languages and can yield great results. This paper presents a hybrid lemmatizer utilizing both a neural model, dictionaries and hand-crafted rules.
Berkecz, Péter +4 more
openaire +2 more sources
Morphological Tagging and Lemmatization in the Albanian Language
An important element of Natural Language Processing is parts of speech tagging. With fine-grained word-class annotations, the word forms in a text can be enhanced and can also be used in downstream processes, such as dependency parsing.
Mati Diellza Nagavci +2 more
doaj +1 more source
Developing Core Technologies for Resource-Scarce Nguni Languages
The creation of linguistic resources is crucial to the continued growth of research and development efforts in the field of natural language processing, especially for resource-scarce languages.
Jakobus S. du Toit, Martin J. Puttkammer
doaj +1 more source
Revolutionizing bantu lexicography: a Zulu case study [PDF]
Zulu uses a conjunctive writing system, that is, a system whereby relatively short linguistic words are joined together to form long orthographic words with complex morphological structures.
de Schryver, Gilles-Maurice
core +4 more sources
Automatic Lemmatizer Construction with Focus on OOV Words Lemmatization [PDF]
This paper deals with the automatic construction of a lemmatizer from a Full Form – Lemma (FFL) training dictionary and with lemmatization of new, in the FFL dictionary unseen, i.e. out-of-vocabulary (OOV) words. Three methods of lemmatization of three kinds of OOV words (missing full forms, unknown words, and compound words) are introduced.
Jakub Kanis, Luděk Müller
openaire +1 more source

