Results 1 to 10 of about 25,762 (157)

The Web as a Parallel Corpus [PDF]

open access: yesComputational Linguistics, 2021
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this article, we report on our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements.
Philip Resnik, Noah A. Smith
doaj   +2 more sources

Web for/as Corpus

open access: yesNordic Journal of African Studies, 2002
In this article the potential of the multilingual Web to function as a corpus, in addition to a source for corpus creation, is examined. Despite the fact that English dominates the Web, and despite the fact that most work in corpus linguistics revolves ...
Gilles-Maurice de Schryver
doaj   +3 more sources

Introduction to the Special Issue on the Web as Corpus

open access: yesComputational Linguistics, 2021
The Web, teeming as it is with language data, of all manner of varieties and languages, in vast quantity and freely available, is a fabulous linguists' playground. This special issue of Computational Linguistics explores ways in which this dream is being explored.
Adam Kilgarriff, Gregory Grefenstette
doaj   +2 more sources

The Web as Corpus and Online Corpora for Legal Translations

open access: yesComparative Legilinguistics, 2018
Legal language is hallmarked by a pedantic and user-unfriendly jargon whose constructs are all but intuitive, not to mention the legal system specificity which makes it unique in every country.
Giampieri Patrizia
doaj   +7 more sources

The web as a corpus: a resource for translation

open access: yesVertimo Studijos, 2018
[full article, abstract in English; abstract in Lithuanian] Accessing ready-made corpora may not be always easy. This is especially true for less dominant languages such as Persian for which the number of available corpora is very limited.
Helia Vaezian
doaj   +3 more sources

Focused Web Corpus Crawling [PDF]

open access: yesProceedings of the 9th Web as Corpus Workshop (WaC-9), 2014
In web corpus construction, crawling is a necessary step, and it is probably the most costly of all, because it requires expensive bandwidth usage, and excess crawling increases storage requirements. Excess crawling results from the fact that the web contains a lot of redundant content (duplicates and near-duplicates), as well as other material not ...
Schäfer, Roland   +2 more
openaire   +2 more sources

The PAISÀ Corpus of Italian Web Texts [PDF]

open access: yesProceedings of the 9th Web as Corpus Workshop (WaC-9), 2014
PAIS`A is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.
Verena Lyding   +8 more
openaire   +4 more sources

Corpulyzer: A Novel Framework for Building Low Resource Language Corpora

open access: yesIEEE Access, 2021
The rapid proliferation of artificial intelligence has led to the development of sophisticated cutting-edge systems in natural language processing and computational linguistics domains.
Bilal Tahir, Muhammad Amir Mehmood
doaj   +1 more source

“Chatbot Communication” as an Object of Linguistic Research in the System of Digital Communications

open access: yesДискурс, 2022
Introduction. The authors of the article consider one of the interesting technologies in the digitalization segment – a chatbot – from a linguistic point of view.
S. V. Kiseleva   +2 more
doaj   +1 more source

The Variations in Verb-Preposition Combinations in the GloWbE Corpus and Its Usage in Informal Englishes

open access: yesREiLA, 2021
This paper is based on the Corpus of Global Web-based English (GloWbE) which was compiled by Mark Davies in 2013. The GloWbE corpus consists of web data from 20 different English speaking countries.
Kazi Amzad Hossain
doaj   +1 more source

Home - About - Disclaimer - Privacy