Results 1 to 10 of about 9,919 (261)

Can comparable corpora be compared? [PDF]

open access: yesIbérica, 2020
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus.
Belén López Arroyo
doaj   +3 more sources

Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora

open access: yesApplied Computer Systems, 2023
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj   +2 more sources

Focused web crawling in the acquisition of comparable corpora

open access: yesInformation Retrieval, 2008
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this.
Ari Pirkola   +2 more
exaly   +3 more sources

Creating Chinese-English Comparable Corpora

open access: yesIEICE Transactions on Information and Systems, 2013
Degen Huang, Fuji Ren
exaly   +3 more sources

The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)

open access: yesDigital Presentation and Preservation of Cultural and Scientific Heritage, 2023
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure.
Petya Osenova
doaj   +1 more source

A Factory of Comparable Corpora from Wikipedia [PDF]

open access: yesProceedings of the Eighth Workshop on Building and Using Comparable Corpora, 2015
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia.
Barrón-Cedeño, Alberto   +3 more
openaire   +2 more sources

Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora

open access: yesCoRR, 2020
In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles.
Adewumi, Oluwatosin   +2 more
openaire   +3 more sources

Contrastive Analysis, Tertium Comparationis and Corpora

open access: yesNJES: Nordic Journal of English studies, 2020
This paper highlights the importance of a common ground, or tertium comparationis, in order to establish unbiased cross-linguistic equivalence in contrastive studies.
Signe Oksefjell Ebeling , Jarle Ebeling
doaj   +1 more source

En termes de travail : terminologie comparée grec-français à partir d’un corpus de l’administration publique

open access: yesStudia Romanica Posnaniensia, 2022
This article presents a comparative study of French and Greek administrative language, as it is used by public administration institutions in their internal communication as well as in their external communi-cation with the citizens.
Mavina Pantazara   +2 more
doaj   +1 more source

Sentence alignment for monolingual comparable corpora [PDF]

open access: yesProceedings of the 2003 conference on Empirical methods in natural language processing -, 2003
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules.
Regina Barzilay, Noemie Elhadad
openaire   +2 more sources

Home - About - Disclaimer - Privacy