Results 11 to 20 of about 2,430,336 (342)
The Web as a Parallel Corpus [PDF]
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements.
Philip Resnik, Noah A. Smith
doaj +2 more sources
ParaMed: a parallel corpus for English-Chinese translation in the biomedical domain. [PDF]
Background Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations.
Liu B, Huang L.
europepmc +2 more sources
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations.
Bert Le Bruyn +6 more
doaj +2 more sources
ECCParaCorp: a cross-lingual parallel corpus towards cancer education, dissemination and application [PDF]
Background The increasing global cancer incidence corresponds to serious health impact in countries worldwide. Knowledge-powered health system in different languages would enhance clinicians’ healthcare practice, patients’ health management and public ...
Hetong Ma +9 more
doaj +2 more sources
Sign languages are used by the deaf and mute community of the world. These are gesture based languages where the subjects use hands and facial expressions to perform different gestures.
Uzma Farooq +4 more
doaj +2 more sources
Data augmentation English-Indonesia-Madurese parallel corpus dataset using neural machine translationMendeley Data [PDF]
INMAD is a dataset containing a corpus of English-Indonesian-Madurese translated sentences. This corpus stores a list of 23086 lines of sentences, as well as their translations in Indonesian and English.
Fairuz Iqbal Maulana +3 more
doaj +2 more sources
A Monolingual Parallel Corpus of Arabic
Abstract We present the first monolingual corpus of Arabic. This is the first parallel monolingual corpus of full sentences in Arabic, automatically generated from translating a parallel bilingual corpus. We provide different versions of the dataset of varying size.
Fatima Al-Raisi +2 more
exaly +2 more sources
The EuroPat Corpus: A Parallel Corpus of European Patent Data.
We present the EuroPat corpus of patent-specific parallel data for 6 official European languages paired with English: German, Spanish, French, Croatian, Norwegian, and Polish. The filtered parallel corpora range in size from 51 million sentences (Spanish-English) to 154k sentences (Croatian-English), with the unfiltered (raw) corpora being up to 2 ...
Heafield, Kenneth +4 more
openaire +4 more sources
OpusFilter: A Configurable Parallel Corpus Filtering Toolbox [PDF]
This paper introduces OpusFilter, a flexible and modular toolbox for filtering parallel corpora. It implements a number of components based on heuristic filters, language identification libraries, character-based language models, and word alignment tools, and it can easily be extended with custom filters.
Mikko Aulamo +2 more
openaire +3 more sources
ParCorFull2.0: a Parallel Corpus Annotated with Full Coreference.
Christian Hardmeier and Elina Lartaud were supported by the Swedish Research Council under grant 2017-930, which also funded the annotation work of the French data. Pedro Augusto Ferreira was supported by FCT, Foundation for Science and Technology, Portugal, under grant SFRH/BD/146578/2019.
Lapshinova-Koltunski, Ekaterina +3 more
openaire +4 more sources

