Results 1 to 10 of about 287,304 (284)
ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application [PDF]
Background The increasing global cancer incidence corresponds to serious health impact in countries worldwide. Knowledge-powered health system in different languages would enhance clinicians’ healthcare practice, patients’ health management and public ...
Hetong Ma +9 more
doaj +2 more sources
Dutch parallel corpus: a balanced parallel corpus for Dutch-English and Dutch-French [PDF]
status ...
FJ Och +14 more
core +6 more sources
Consumer Eroski parallel corpus
This paper introduces the Consumer Eroski Parallel Corpus, a collection of articles originally written in Spanish and later translated to three languages also spoken in Spain: Basque, Catalan and Galician. The articles have been correlated in the four
Asier Alcázar
doaj +3 more sources
Data augmentation English-Indonesia-Madurese parallel corpus dataset using neural machine translationMendeley Data [PDF]
INMAD is a dataset containing a corpus of English-Indonesian-Madurese translated sentences. This corpus stores a list of 23086 lines of sentences, as well as their translations in Indonesian and English.
Fairuz Iqbal Maulana +3 more
doaj +2 more sources
The Bulgarian-Polish-Russian parallel corpus
The Bulgarian-Polish-Russian parallel corpus The Semantics Laboratory Team of Institute of Slavic Studies of Polish Academy of Sciences is planning to begin work on the creation of a Bulgarian-Polish-Russian parallel corpus.
Maksim Duškin +1 more
doaj +3 more sources
Dutch parallel corpus : a multilingual annotated corpus [PDF]
Desmet, Piet +5 more
core +2 more sources
Dutch parallel corpus: a multifunctional and multilingual corpus [PDF]
Desmet, Piet +4 more
core +2 more sources
Parallel corpora are vital components in several applications of Natural Language Processing (NLP), particularly in machine translation. In this paper, we present a novel method for automatically creating parallel sentences from comparable corpora.
Maha Jarallah Althobaiti
doaj +1 more source
Dutch Parallel Corpus: A Balanced Copyright-Cleared Parallel Corpus [PDF]
This paper presents the Dutch Parallel Corpus, a high-quality parallel corpus for Dutch, French and English consisting of more than ten million words. The corpus contains five different text types and is balanced with respect to text type and translation direction. All texts included in the corpus have been cleared from copyright.
Macken, Lieve +2 more
openaire +1 more source
OpusFilter: A Configurable Parallel Corpus Filtering Toolbox [PDF]
This paper introduces OpusFilter, a flexible and modular toolbox for filtering parallel corpora. It implements a number of components based on heuristic filters, language identification libraries, character-based language models, and word alignment tools, and it can easily be extended with custom filters.
Aulamo, Mikko +2 more
openaire +2 more sources

