Results 1 to 10 of about 9,919 (261)
Can comparable corpora be compared? [PDF]
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus.
Belén López Arroyo
doaj +3 more sources
Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj +2 more sources
Focused web crawling in the acquisition of comparable corpora
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this.
Ari Pirkola +2 more
exaly +3 more sources
Creating Chinese-English Comparable Corpora
Degen Huang, Fuji Ren
exaly +3 more sources
The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure.
Petya Osenova
doaj +1 more source
A Factory of Comparable Corpora from Wikipedia [PDF]
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia.
Barrón-Cedeño, Alberto +3 more
openaire +2 more sources
Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora
In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles.
Adewumi, Oluwatosin +2 more
openaire +3 more sources
Contrastive Analysis, Tertium Comparationis and Corpora
This paper highlights the importance of a common ground, or tertium comparationis, in order to establish unbiased cross-linguistic equivalence in contrastive studies.
Signe Oksefjell Ebeling , Jarle Ebeling
doaj +1 more source
This article presents a comparative study of French and Greek administrative language, as it is used by public administration institutions in their internal communication as well as in their external communi-cation with the citizens.
Mavina Pantazara +2 more
doaj +1 more source
Sentence alignment for monolingual comparable corpora [PDF]
We address the problem of sentence alignment for monolingual corpora, a phenomenon distinct from alignment in parallel corpora. Aligning large comparable corpora automatically would provide a valuable resource for learning of text-to-text rewriting rules.
Regina Barzilay, Noemie Elhadad
openaire +2 more sources

