Results 11 to 20 of about 11,846 (303)
Bootstrapping parallel corpora [PDF]
We present two methods for the automatic creation of parallel corpora. Whereas previous work into the automatic construction of parallel corpora has focused on harvesting them from the web, we examine the use of existing parallel corpora to bootstrap data for new language pairs.
Chris Callison-Burch, Miles Osborne
openaire +2 more sources
Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj +1 more source
Automatic Generation of Exercises for Second Language Learning from Parallel Corpus Data [PDF]
Creating language learning exercises is a time-consuming task and made-up sample sentences frequently lack authenticity. Authentic samples can be obtained from corpora, but it is necessary to identify material that is suitable for language learners ...
Arianna Zanetti +2 more
doaj +1 more source
Automatic alignment in parallel corpora [PDF]
This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with ...
Harris Papageorgiou +2 more
openaire +1 more source
MulTed: a multilingual aligned and tagged parallel corpus [PDF]
Recently, more data-driven approaches are demanding multilingual parallel resources primarily in the cross-language studies. To meet these demands, building multilingual parallel corpora are becoming the focus of many Natural Language Processing (NLP ...
Imad Zeroual, Abdelhak Lakhouaja
doaj +1 more source
Sense discrimination with parallel corpora [PDF]
This paper describes an experiment that uses translation equivalents derived from parallel corpora to determine sense distinctions that can be used for automatic sense-tagging and other disambiguation tasks. Our results show that sense distinctions derived from cross-lingual information are at least as reliable as those made by human annotators ...
Nancy Ide, Tomaz Erjavec, Dan Tufis
openaire +1 more source
This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations.
Bert Le Bruyn +6 more
doaj +1 more source
Paraphrasing with bilingual parallel corpora [PDF]
Previous work has used monolingual parallel corpora to extract and generate paraphrases. We show that this task can be done using bilingual parallel corpora, a much more commonly available resource. Using alignment techniques from phrasebased statistical machine translation, we show how paraphrases in one language can be identified using a phrase in ...
Bannard, Colin, Callison-Burch, Chris
openaire +1 more source
Aligning sentences in parallel corpora [PDF]
In this paper we describe a statistical technique for aligning sentences with their translations in two parallel corpora. In addition to certain anchor points that are available in our data, the only information about the sentences that we use for calculating alignments is the number of tokens that they contain.
Peter F. Brown +2 more
openaire +1 more source
Feasibility of using corpora as a tool in translation practice
Professional translators commonly employ various tools to streamline and ensure the accuracy and consistency of their work. One such tool is corpora, which becomes particularly crucial when dealing with authentic texts like those from the United Nations
Noureldin Abdelaal
doaj +1 more source

