Results 251 to 260 of about 91,129 (293)

Comparable parallel corpora

Studies in Corpus Linguistics, 2019
Are papers presented in corpus-based translation studies truly scientific? These are normally done on only one language pair, often on purpose-made parallel corpora, and can normally not be replicated. Therefore their value is limited in a strictly scientific sense.
exaly   +2 more sources

Multiword expressions in comparable corpora

IVITRA Research in Linguistics and Literature, 2020
Abstract On the basis of Aranea Gigaword Web corpora, a family of comparable corpora intended for use in contrastive linguistic research, multilingual lexicography, language teaching and translation studies we discuss the pros and cons of comparable corpora in ...
exaly   +2 more sources

Comparing web-crawled and traditional corpora

Language Resources and Evaluation, 2020
Using a multi-dimensional (MD) analysis of register variability, the study compares two corpora of Czech: Koditex, a “traditional” corpus carefully designed using various sources with rich metadata, and Araneum Bohemicum Maximum, a web-crawled corpus with an opportunistic composition representative of the “searchable” web.
Václav Cvrcek   +6 more
openaire   +1 more source

Comparing Corpora

International Journal of Corpus Linguistics, 2001
Corpus linguistics lacks strategies for describing and comparing corpora. Currently most descriptions of corpora are textual, and questions such as ‘what sort of a corpus is this?’, or ‘how does this corpus compare to that?’ can only be answered impressionistically.
openaire   +1 more source

Twitter As a Multilingual Source of Comparable Corpora

Proceedings of the 12th International Conference on Advances in Mobile Computing and Multimedia, 2014
This article describes a new method to build comparable corpora from Twitter. Our strategy relies on the fact that Twitter is one of the most popular online social microblog allowing large audiences to express their thoughts and reactions about specific events or breaking news in various languages. Given two languages and a particular topic, We propose
Malek Hajjem   +2 more
openaire   +1 more source

Comparative study on corpora for speech translation

IEEE Transactions on Audio, Speech and Language Processing, 2006
This paper investigates issues in preparing corpora for developing speech-to-speech translation (S2ST). It is impractical to create a broad-coverage parallel corpus only from dialog speech. An alternative approach is to have bilingual experts write conversational-style texts in the target domain, with translations.
Gen-ichiro Kikui   +3 more
openaire   +1 more source

Semi-Automatic Parallel Corpora Extraction from Comparable News Corpora

Polibits, 2010
The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill.
Thoudam Doren Singh   +1 more
openaire   +1 more source

Collecting Comparable Corpora

2019
The availability of parallel corpora is limited, especially for under-resourced languages and narrow domains. On the other hand, the number of comparable documents in these areas that are freely available on the Web is continuously increasing. Algorithmic approaches to identify these documents from the Web are needed for the purpose of automatically ...
Monica Lestari Paramita   +11 more
openaire   +1 more source

Home - About - Disclaimer - Privacy