Results 21 to 30 of about 11,846 (303)

KenTrans: A Parallel Corpora for Swahili and local Kenyan Languages

open access: yes, 2022
This project produced a parallel corpus between Swahili and 2 other Kenya Languages: Dholuo and Luhya. The Luhya Language has several dialects. In the project 3 dialects were chosen as a start: Lumarachi, Logooli and Lubukusi.
Muchemi, Lawrence   +5 more
core   +1 more source

Pièges méthodologiques des corpus parallèles et comment les éviter

open access: yesCorela, 2017
In this article, we present methodological problems pertaining to the exploitation of parallel corpora, i.e. corpora composed of translations and their respective originals, and we try to propose the principles and rules helping to avoid said problems ...
Olga Nádvorníková
doaj   +1 more source

Principled Paraphrase Generation with Parallel Corpora

open access: yesProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Round-trip Machine Translation (MT) is a popular choice for paraphrase generation, which leverages readily available parallel corpora for supervision. In this paper, we formalize the implicit similarity function induced by this approach, and show that it is susceptible to non-paraphrase pairs sharing a single ambiguous translation.
Aitor Ormazabal   +4 more
openaire   +3 more sources

Semantics, contrastive linguistics and parallel corpora

open access: yesCognitive Studies | Études cognitives, 2014
Semantics, contrastive linguistics and parallel corpora In view of the ambiguity of the term “semantics”, the author shows the differences between the traditional lexical semantics and the contemporary semantics in the light of various semantic schools.
Violetta Koseska
doaj   +1 more source

UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study

open access: yesApplied Sciences, 2020
Machine translation (MT) has recently attracted much research on various advanced techniques (i.e., statistical-based and deep learning-based) and achieved great results for popular languages.
Van-Hai Vu   +3 more
doaj   +1 more source

The use of English, Czech and French punctuation marks in reference, parallel and comparable web corpora: a question of methodology [PDF]

open access: yesLinguistica Pragensia, 2020
This paper analyses the frequency of six punctuation marks (the comma, period, colon, semicolon, question mark and exclamation mark) in three languages (English, French and Czech) in three different types of corpora — comparable web corpora, large ...
Olga Nádvorníková
doaj  

Semi-automatic ontological alignment of digitized books parallel corpora

open access: yesMokslas: Lietuvos Ateitis, 2021
In this paper, we present a method for general ontology management integration with an alignment of digitized books paraphrase corpus, which have been compiled from bilingual parallel corpus.
Algirdas Laukaitis, Neda Laukaitytė
doaj   +1 more source

XML schemas for parallel corpora [PDF]

open access: yes, 2011
Parallel corpora are resources used in Natural Language Processing and Computational Linguistics. They are defined as a set of texts, in different languages, that are translations of each other. Note that these translations do not need to cover the full document, as we might have sentences translated just on some of the languages. When dealing with the
Simões, Alberto, Fernandes, Sara
openaire   +2 more sources

On Semantic Annotation in Clarin-PL Parallel Corpora

open access: yesCognitive Studies | Études cognitives, 2015
On Semantic Annotation in Clarin-PL Parallel Corpora In the article, the authors present a proposal for semantic annotation in Clarin-PL parallel corpora: Polish-Bulgarian-Russian and Polish-Lithuanian ones.
Violetta Koseska-Toszewa, Roman Roszko
doaj   +1 more source

Multilingual corpora in contrastive research on the vocative in Russian, Polish and Lithuanian

open access: yesSlavistična Revija, 2021
The aim of this paper was to conduct a contrastive analysis of the vocative forms in Russian, Polish and Lithuanian, which was to be prefaced by a short introduction that would discuss the benefits of using non-commercial multilingual corpora in such ...
Maksim Duszkin   +2 more
doaj  

Home - About - Disclaimer - Privacy