Results 211 to 220 of about 25,861 (256)
Some of the next articles are maybe not open access.
Creating a Web Corpus Using GO
2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), 2021The Web contains large amounts of textual data which could be used as a source to create new corpora, yet there are not many plug and play solutions for scraping specific parts of the websites. This paper presents a new open-source solution for downloading and parsing HTML websites which can be configured from one configuration file. As a demonstration
exaly +2 more sources
The Web for Corpus and the Web as Corpus in Translator Training
2013New Voices in Translation Studies, 10, 1, 54 ...
Buendía-Castro, Miriam +1 more
openaire +1 more source
2014
This contribution aims to provide a representative sample of Slovak colloquial language in an organized corpus. The corpus makes it possible to study spontaneous, interactive communication that often includes various incorrect or unusual words. The corpus includes a complete set of web discussions about various topics from a single site.
Daniel Hládek, Ján Stas, Jozef Juhár
openaire +1 more source
This contribution aims to provide a representative sample of Slovak colloquial language in an organized corpus. The corpus makes it possible to study spontaneous, interactive communication that often includes various incorrect or unusual words. The corpus includes a complete set of web discussions about various topics from a single site.
Daniel Hládek, Ján Stas, Jozef Juhár
openaire +1 more source
2018
The World Wide Web is viewed as a useful linguistic resource since it is a unique linguistic world that is full of surprising linguistic data and information. It is the largest store of texts in existence that is freely-available for all kinds of works.
Niladri Sekhar Dash, S. Arulmozi
openaire +1 more source
The World Wide Web is viewed as a useful linguistic resource since it is a unique linguistic world that is full of surprising linguistic data and information. It is the largest store of texts in existence that is freely-available for all kinds of works.
Niladri Sekhar Dash, S. Arulmozi
openaire +1 more source
Word sense distribution in a web corpus
9th IEEE International Conference on Cognitive Informatics (ICCI'10), 2010World Wide Web has become an important knowledge source for many research fields, and quality of Web-acquired knowledge has direct impact on their performance. While evaluation of the vast amount of Web resources is out of question, in this paper we examined thousands of sentences containing twelve preselected words and produced several quality ...
Ping Chen 0001 +4 more
openaire +1 more source
Errors in the Russian Web Corpus
Computational Linguistics and Intellectual Technologies, 2022The explosion of the Web leads to the production of large amounts of texts and inevitably influences their quality. Errors that tend to occur more often can distort results, especially when texts are used for scientific purposes, in language teaching or learning.
openaire +1 more source
2013
As a result of the European Union’s pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international
FERRARESI, ADRIANO, BERNARDINI, SILVIA
openaire +1 more source
As a result of the European Union’s pressure towards internationalization, universities in many countries find themselves increasingly urged to provide information on their requirements and services and to promote themselves in English on the web. Hence the need for corpus resources and studies of institutional academic English used as an international
FERRARESI, ADRIANO, BERNARDINI, SILVIA
openaire +1 more source
Wacky! working papers on the web as corpus
2006[CURATELA]
BARONI, MARCO, BERNARDINI, SILVIA
openaire +2 more sources
Measuring Web-Corpus Randomness: A Progress Report
2005The Web allows fast and inexpensive construction of general purpose corpora, i.e., corpora that are not meant to represent a specific sublanguage, but a language as a whole, and thus should be unbiased with respect to domains and genres. In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness (with respect
Baroni, Marco, M. Ciaramita
openaire +3 more sources

