Web as corpus - Open Access .click

Results 211 to 220 of about 25,861 (256)

Some of the next articles are maybe not open access.

2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), 2021
The Web contains large amounts of textual data which could be used as a source to create new corpora, yet there are not many plug and play solutions for scraping specific parts of the websites. This paper presents a new open-source solution for downloading and parsing HTML websites which can be configured from one configuration file. As a demonstration
exaly +2 more sources

The Web for Corpus and the Web as Corpus in Translator Training

2013
New Voices in Translation Studies, 10, 1, 54 ...
Buendía-Castro, Miriam, López-Rodríguez, Clara Inés +1 more
openaire +1 more source

Slovak Web Discussion Corpus

2014
This contribution aims to provide a representative sample of Slovak colloquial language in an organized corpus. The corpus makes it possible to study spontaneous, interactive communication that often includes various incorrect or unusual words. The corpus includes a complete set of web discussions about various topics from a single site.
Daniel Hládek, Ján Stas, Jozef Juhár
openaire +1 more source

Web Text Corpus

2018
The World Wide Web is viewed as a useful linguistic resource since it is a unique linguistic world that is full of surprising linguistic data and information. It is the largest store of texts in existence that is freely-available for all kinds of works.
Niladri Sekhar Dash, S. Arulmozi
openaire +1 more source

Word sense distribution in a web corpus

9th IEEE International Conference on Cognitive Informatics (ICCI'10), 2010
World Wide Web has become an important knowledge source for many research fields, and quality of Web-acquired knowledge has direct impact on their performance. While evaluation of the vast amount of Web resources is out of question, in this paper we examined thousands of sentences containing twelve preselected words and produced several quality ...
Ping Chen 0001 +4 more
openaire +1 more source

Errors in the Russian Web Corpus

Computational Linguistics and Intellectual Technologies, 2022
The explosion of the Web leads to the production of large amounts of texts and inevitably influences their quality. Errors that tend to occur more often can distort results, especially when texts are used for scientific purposes, in language teaching or learning.
openaire +1 more source

The academic Web-as-Corpus

2013
As a result of the European Union’s pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international
FERRARESI, ADRIANO, BERNARDINI, SILVIA
openaire +1 more source

Wacky! working papers on the web as corpus

2006
[CURATELA]
BARONI, MARCO, BERNARDINI, SILVIA
openaire +2 more sources

Measuring Web-Corpus Randomness: A Progress Report

2005
The Web allows fast and inexpensive construction of general purpose corpora, i.e., corpora that are not meant to represent a specific sublanguage, but a language as a whole, and thus should be unbiased with respect to domains and genres. In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness (with respect
Baroni, Marco, M. Ciaramita
openaire +3 more sources

corpus linguistics
corpus
corpora

legal english
legal translations
legal language

online corpora
web
computational linguistics