Results 1 to 10 of about 323,110 (290)
The Web as a Parallel Corpus [PDF]
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements.
Philip Resnik, Noah A. Smith
doaj +2 more sources
In this article the potential of the multilingual Web to function as a corpus, in addition to a source for corpus creation, is examined. Despite the fact that English dominates the Web, and despite the fact that most work in corpus linguistics revolves ...
Gilles-Maurice de Schryver
doaj +3 more sources
The Web as Corpus and Online Corpora for Legal Translations
Legal language is hallmarked by a pedantic and user-unfriendly jargon whose constructs are all but intuitive, not to mention the legal system specificity which makes it unique in every country.
Giampieri Patrizia
doaj +7 more sources
Introduction to the Special Issue on the Web as Corpus
The Web, teeming as it is with language data, of all manner of varieties and languages, in vast quantity and freely available, is a fabulous linguists' playground. This special issue of Computational Linguistics explores ways in which this dream is being explored.
Adam Kilgarriff, Gregory Grefenstette
doaj +2 more sources
The web as a corpus: a resource for translation
[full article, abstract in English; abstract in Lithuanian] Accessing ready-made corpora may not be always easy. This is especially true for less dominant languages such as Persian for which the number of available corpora is very limited.
Helia Vaezian
doaj +3 more sources
Web-application for Presentation of Bulgarian Language Heritage: Bilingual Digital Corpora and Dictionaries [PDF]
The paper describes three software packages – the main components of a software system for processing and web-presentation of Bulgarian language resources – parallel corpora and bilingual dictionaries. The author briefly prese nts current versions of the
Ralitsa Dutsova
doaj +3 more sources
Providing Web Archive News Articles as Corpus Data
While the huge data repositories of web archives carry big potential for knowledge production in academia, researchers have described significant challenges when trying to access and make use of web archives in research.
Jon Carlstedt Tønnessen +1 more
doaj +2 more sources
The World Wide Web as a Linguistic CorpusVersion française
None
T Russon Wooldridge
doaj +2 more sources
When Intuition Fails us: the World Wide Web as a Corpus
In some respects corpus linguistics has made a significant contribution to foreign language (L2) instruction: for example, reference tools like dictionaries and grammar books are at present enriched by various types of information derived from corpora ...
Paweł Scheffler
doaj +4 more sources
This paper presents a proposal to facilitate the use of the annotated web as corpus by alleviating the annotation bottleneck for corpus data drawn from the web. We describe a framework for large-scale distributed corpus annotation using peer-to-peer (P2P) technology to meet this need.
Rayson, P. +3 more
openaire +1 more source

