Parallel corpus - Open Access .click

Results 11 to 20 of about 2,430,336 (342)

Computational Linguistics, 2021
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements.
Philip Resnik, Noah A. Smith
doaj +2 more sources

ParaMed: a parallel corpus for English-Chinese translation in the biomedical domain. [PDF]

BMC Med Inform Decis Mak, 2021
Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations.
Liu B, Huang L.
europepmc +2 more sources

Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions

Languages, 2022
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations.
Bert Le Bruyn +6 more
doaj +2 more sources

ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application [PDF]

BMC Medical Informatics and Decision Making, 2020
Background The increasing global cancer incidence corresponds to serious health impact in countries worldwide. Knowledge-powered health system in different languages would enhance clinicians’ healthcare practice, patients’ health management and public ...
Hetong Ma +9 more
doaj +2 more sources

A Crowdsourcing-Based Framework for the Development and Validation of Machine Readable Parallel Corpus for Sign Languages

IEEE Access, 2021
Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures.
Uzma Farooq +4 more
doaj +2 more sources

Data augmentation English-Indonesia-Madurese parallel corpus dataset using neural machine translationMendeley Data [PDF]

Data in Brief
INMAD is a dataset containing a corpus of English-Indonesian-Madurese translated sentences. This corpus stores a list of 23086 lines of sentences, as well as their translations in Indonesian and English.
Fairuz Iqbal Maulana +3 more
doaj +2 more sources

A Monolingual Parallel Corpus of Arabic

Procedia Computer Science, 2018
Abstract We present the first monolingual corpus of Arabic. This is the first parallel monolingual corpus of full sentences in Arabic, automatically generated from translating a parallel bilingual corpus. We provide different versions of the dataset of varying size.
Fatima Al-Raisi, Weijian Lin, Abdelwahab Bourai +2 more
exaly +2 more sources

The EuroPat Corpus: A Parallel Corpus of European Patent Data.

International Conference on Language Resources and Evaluation, 2022
We present the EuroPat corpus of patent-specific parallel data for 6 official European languages paired with English: German, Spanish, French, Croatian, Norwegian, and Polish. The filtered parallel corpora range in size from 51 million sentences (Spanish-English) to 154k sentences (Croatian-English), with the unfiltered (raw) corpora being up to 2 ...
Heafield, Kenneth +4 more
openaire +4 more sources

OpusFilter: A Configurable Parallel Corpus Filtering Toolbox [PDF]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020
This paper introduces OpusFilter, a flexible and modular toolbox for filtering parallel corpora. It implements a number of components based on heuristic filters, language identification libraries, character-based language models, and word alignment tools, and it can easily be extended with custom filters.
Mikko Aulamo, Sami Virpioja, Jörg Tiedemann +2 more
openaire +3 more sources

ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference.

International Conference on Language Resources and Evaluation, 2022
Christian Hardmeier and Elina Lartaud were supported by the Swedish Research Council under grant 2017-930, which also funded the annotation work of the French data. Pedro Augusto Ferreira was supported by FCT, Foundation for Science and Technology, Portugal, under grant SFRH/BD/146578/2019.
Lapshinova-Koltunski, Ekaterina +3 more
openaire +4 more sources

natural language processing
fos: computer and information sciences
corpus linguistics

computation and language cs.cl
computer science - computation and language
translation

languages and literatures
parallel corpora
neural machine translation