Results 261 to 270 of about 91,129 (293)
Some of the next articles are maybe not open access.

Exploiting Comparable Corpora

2013
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data. Currently available parallel corpora span relatively few language pairs and very few domains; building new ones of sufficiently large size and high quality is time-consuming and expensive.
Dragos Stefan Munteanu, Daniel Marcu
openaire   +1 more source

How Comparable Can 'Comparable Corpora' Be?

Target. International Journal of Translation Studies, 1997
AbstractThe development of a coherent methodology for corpus-based work in translation studies is essential for the evolution of this newfield of research into a fully-fledged paradigm within the discipline. The design of a monolingual, multi-source-language comparable corpus of English as a resource for the systematic study of the nature of translated
openaire   +1 more source

Building and Using Comparable Corpora

2013
The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated.
openaire   +1 more source

Repetition and language models and comparable corpora

Proceedings of the 2nd Workshop on Building and Using Comparable Corpora from Parallel to Non-parallel Corpora - BUCC '09, 2009
I will discuss a couple of non-standard features that I believe could be useful for working with comparable corpora. Dotplots have been used in biology to find interesting DNA sequences. Biology is interested in ordered matches, which show up as (possibly broken) diagonals in dot-plots. Information Retrieval is more interested in unordered matches (e.g.
openaire   +2 more sources

Exploiting comparable corpora

2015
One of the major bottlenecks in the development of Statistical Machine Translation systems for most language pairs is the lack of bilingual parallel training data.
openaire   +1 more source

Comparative evaluation of two arabic speech corpora

Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010), 2010
The aim of this paper is to conduct a constructive and comparative evaluation between two important Arabic corpora for two different Arabic dialects, namely, Saudi dialect corpus that was collected by King Abdulaziz City for Science and Technology (KACST), and a Levantine Arabic dialect corpus.
Yousef Ajami Alotaibi, Ali Hamid Meftah
openaire   +1 more source

A Collection of Comparable Corpora for Under-resourced Languages

2010
This paper presents work on collecting comparable corpora for 9 language pairs: Estonian-English, Latvian-English, Lithuanian-English, Greek-English, Greek-Romanian, Croatian-English, Romanian-English, Romanian-German and Slovenian-English. The objective of this work was to gather texts from the same domains and genres and with a similar level of ...
Inguna Skadina   +6 more
openaire   +1 more source

Building Comparable Corpora

2023
Serge Sharoff   +2 more
openaire   +1 more source

Comparing Learner Corpora

2020
Sandra C. Deshors, Stefan Th. Gries
openaire   +1 more source

Parallel and Comparable Corpora:

2007
McEnery, Tony, Xiao, Richard
openaire   +1 more source

Home - About - Disclaimer - Privacy