Comparable corpora - Open Access .click

Results 1 to 10 of about 9,919 (261)

Can comparable corpora be compared? [PDF]

Ibérica, 2020
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus.
Belén López Arroyo
doaj +3 more sources

Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora

Applied Computer Systems, 2023
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj +2 more sources

Focused web crawling in the acquisition of comparable corpora

Information Retrieval, 2008
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this.
Ari Pirkola, Kalervo Järvelin, Martti Juhola +2 more
exaly +3 more sources

Creating Chinese-English Comparable Corpora

IEICE Transactions on Information and Systems, 2013
Degen Huang, Fuji Ren
exaly +3 more sources

The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)

Digital Presentation and Preservation of Cultural and Scientific Heritage, 2023
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure.
Petya Osenova
doaj +1 more source

A Factory of Comparable Corpora from Wikipedia [PDF]

Proceedings of the Eighth Workshop on Building and Using Comparable Corpora, 2015
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia.
Barrón-Cedeño, Alberto +3 more
openaire +2 more sources

Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora

CoRR, 2020
In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles.
Adewumi, Oluwatosin, Liwicki, Foteini, Liwicki, Marcus +2 more
openaire +3 more sources

Contrastive Analysis, Tertium Comparationis and Corpora

NJES: Nordic Journal of English studies, 2020
This paper highlights the importance of a common ground, or tertium comparationis, in order to establish unbiased cross-linguistic equivalence in contrastive studies.
Signe Oksefjell Ebeling , Jarle Ebeling
doaj +1 more source

En termes de travail : terminologie comparée grec-français à partir d’un corpus de l’administration publique

Studia Romanica Posnaniensia, 2022
This article presents a comparative study of French and Greek administrative language, as it is used by public administration institutions in their internal communication as well as in their external communi-cation with the citizens.
Mavina Pantazara, Eleni Tziafa, Angeliki Christopoulou +2 more
doaj +1 more source

Sentence alignment for monolingual comparable corpora [PDF]

Proceedings of the 2003 conference on Empirical methods in natural language processing -, 2003
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules.
Regina Barzilay, Noemie Elhadad
openaire +2 more sources

parallel corpora
natural language processing
computational linguistics

fos: computer and information sciences
corpus linguistics
computation and language cs.cl

comparable corpus
parallel and comparable corpora
computer science - computation and language