Results 1 to 10 of about 91,129 (293)

Can comparable corpora be compared? [PDF]

open access: yesIbérica, 2020
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus.
Belén López Arroyo
doaj   +3 more sources

Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora

open access: yesApplied Computer Systems, 2023
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj   +2 more sources

Neoclassical Compound Alignments from Comparable Corpora [PDF]

open access: yes, 2012
International audienceThe paper deals with the automatic compilation of bilingual dictionary from specialized comparable corpora. We concentrate on a method to automatically extract and to align neoclassical compounds in two languages from comparable ...
Daille, Béatrice   +2 more
core   +5 more sources

Focused web crawling in the acquisition of comparable corpora

open access: yesInformation Retrieval, 2008
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this.
Martti Juhola
exaly   +3 more sources

Creating Chinese-English Comparable Corpora

open access: yesIEICE Transactions on Information and Systems, 2013
Degen Huang, Shanshan Wang, Fuji Ren
exaly   +3 more sources

The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)

open access: yesDigital Presentation and Preservation of Cultural and Scientific Heritage, 2023
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure.
Petya Osenova
doaj   +1 more source

In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora [PDF]

open access: yes, 2020
Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation.
Hoste, Veronique   +2 more
core   +2 more sources

Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora

open access: yesCoRR, 2020
In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles.
Adewumi, Oluwatosin   +2 more
openaire   +3 more sources

A Factory of Comparable Corpora from Wikipedia [PDF]

open access: yesProceedings of the Eighth Workshop on Building and Using Comparable Corpora, 2015
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia.
Barrón-Cedeño, Alberto   +3 more
openaire   +2 more sources

Contrastive Analysis, Tertium Comparationis and Corpora

open access: yesNJES: Nordic Journal of English studies, 2020
This paper highlights the importance of a common ground, or tertium comparationis, in order to establish unbiased cross-linguistic equivalence in contrastive studies.
Signe Oksefjell Ebeling , Jarle Ebeling
doaj   +1 more source

Home - About - Disclaimer - Privacy