Results 261 to 270 of about 91,129 (293)
Some of the next articles are maybe not open access.
2013
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data. Currently available parallel corpora span relatively few language pairs and very few domains; building new ones of sufficiently large size and high quality is time-consuming and expensive.
Dragos Stefan Munteanu, Daniel Marcu
openaire +1 more source
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data. Currently available parallel corpora span relatively few language pairs and very few domains; building new ones of sufficiently large size and high quality is time-consuming and expensive.
Dragos Stefan Munteanu, Daniel Marcu
openaire +1 more source
How Comparable Can 'Comparable Corpora' Be?
Target. International Journal of Translation Studies, 1997AbstractThe development of a coherent methodology for corpus-based work in translation studies is essential for the evolution of this newfield of research into a fully-fledged paradigm within the discipline. The design of a monolingual, multi-source-language comparable corpus of English as a resource for the systematic study of the nature of translated
openaire +1 more source
Building and Using Comparable Corpora
2013The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated.
openaire +1 more source
Repetition and language models and comparable corpora
Proceedings of the 2nd Workshop on Building and Using Comparable Corpora from Parallel to Non-parallel Corpora - BUCC '09, 2009I will discuss a couple of non-standard features that I believe could be useful for working with comparable corpora. Dotplots have been used in biology to find interesting DNA sequences. Biology is interested in ordered matches, which show up as (possibly broken) diagonals in dot-plots. Information Retrieval is more interested in unordered matches (e.g.
openaire +2 more sources
2015
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data.
openaire +1 more source
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data.
openaire +1 more source
Comparative evaluation of two arabic speech corpora
Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010), 2010The aim of this paper is to conduct a constructive and comparative evaluation between two important Arabic corpora for two different Arabic dialects, namely, Saudi dialect corpus that was collected by King Abdulaziz City for Science and Technology (KACST), and a Levantine Arabic dialect corpus.
Yousef Ajami Alotaibi, Ali Hamid Meftah
openaire +1 more source
A Collection of Comparable Corpora for Under-resourced Languages
2010This paper presents work on collecting comparable corpora for 9 language pairs: Estonian-English, Latvian-English, Lithuanian-English, Greek-English, Greek-Romanian, Croatian-English, Romanian-English, Romanian-German and Slovenian-English. The objective of this work was to gather texts from the same domains and genres and with a similar level of ...
Inguna Skadina +6 more
openaire +1 more source

