Results 1 to 10 of about 91,129 (293)
Can comparable corpora be compared? [PDF]
Podemos afirmar que, hoy en día, no existe un acuerdo unánime sobre los criterios para compilar un corpus comparable o sobre cómo evaluar la comparabilidad de un corpus.
Belén López Arroyo
doaj +3 more sources
Building English – Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
Comparable corpora are the right resources for extracting parallel data due to their abundant availability. It is of great importance where parallel data are scarce.
Kaur Dilshad, Singh Satwinder
doaj +2 more sources
Neoclassical Compound Alignments from Comparable Corpora [PDF]
International audienceThe paper deals with the automatic compilation of bilingual dictionary from specialized comparable corpora. We concentrate on a method to automatically extract and to align neoclassical compounds in two languages from comparable ...
Daille, Béatrice +2 more
core +5 more sources
Focused web crawling in the acquisition of comparable corpora
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this.
Martti Juhola
exaly +3 more sources
Creating Chinese-English Comparable Corpora
Degen Huang, Shanshan Wang, Fuji Ren
exaly +3 more sources
The Role of Language Technologies in Digital Humanities (The Case of Parliamentary Debates)
The paper focuses on the use case of parliamentary debates as part of Digital Humanities. First, the ParlaMint project is outlined as a flagship initiative of CLARIN ERIC infrastructure.
Petya Osenova
doaj +1 more source
In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora [PDF]
Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation.
Hoste, Veronique +2 more
core +2 more sources
Corpora Compared: The Case of the Swedish Gigaword & Wikipedia Corpora
In this work, we show that the difference in performance of embeddings from differently sourced data for a given language can be due to other factors besides data size. Natural language processing (NLP) tasks usually perform better with embeddings from bigger corpora. However, broadness of covered domain and noise can play important roles.
Adewumi, Oluwatosin +2 more
openaire +3 more sources
A Factory of Comparable Corpora from Wikipedia [PDF]
Multiple approaches to grab comparable data from the Web have been developed up to date. Nevertheless, coming out with a high-quality comparable corpus of a specific topic is not straightforward. We present a model for the automatic extraction of comparable texts in multiple languages and on specific topics from Wikipedia.
Barrón-Cedeño, Alberto +3 more
openaire +2 more sources
Contrastive Analysis, Tertium Comparationis and Corpora
This paper highlights the importance of a common ground, or tertium comparationis, in order to establish unbiased cross-linguistic equivalence in contrastive studies.
Signe Oksefjell Ebeling , Jarle Ebeling
doaj +1 more source

