Results 231 to 240 of about 2,674 (265)
Some of the next articles are maybe not open access.

The Improved Pagerank in Web Crawler

2009 First International Conference on Information Science and Engineering, 2009
Pagerank is an algorithm for rating web pages. It introduces the relationship of citation in academic papers to evaluate the web page's authority. It gives the same weight to all edges and ignores the relevancy of web pages to the topic, resulting in a problem of topic-drift.
Qin Zheng, Zhang Ling
openaire   +2 more sources

Cross-supervised synthesis of web-crawlers

Proceedings of the 38th International Conference on Software Engineering, 2016
A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ dramatically. Manually customizing a crawler for each specific site is time consuming and error-prone.
Adi Omari, Sharon Shoham, Eran Yahav
openaire   +2 more sources

UbiCrawler: a scalable fully distributed Web crawler

Software: Practice and Experience, 2004
AbstractWe report our experience in implementing UbiCrawler, a scalable distributed Web crawler, using the Java programming language. The main features of UbiCrawler are platform independence, linear scalability, graceful degradation in the presence of faults, a very effective assignment function (based on consistent hashing) for partitioning the ...
M. Santini   +3 more
openaire   +6 more sources

A framework of deep Web crawler

2008 27th Chinese Control Conference, 2008
As an ever-increasing amount of information on the Web today is available through search interfaces, users have to key in a set of keywords in order to access the pages from certain Web sites, which are often referred to as the hidden Web or the deep Web.
Xiang Peisu, Tian Ke, Huang Qin-zhen
openaire   +2 more sources

Measuring the web crawler ethics

Proceedings of the 19th international conference on World wide web, 2010
Web crawlers are highly automated and seldom regulated manually. The diversity of crawler activities often leads to ethical problems such as spam and service attacks. In this research, quantitative models are proposed to measure the web crawler ethics based on their behaviors on web servers.
C. Lee Giles   +2 more
openaire   +1 more source

A Novel Architecture for Deep Web Crawler

International Journal of Information Technology and Web Engineering, 2011
A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information.
Dilip Kumar Sharma, A. K. Sharma
openaire   +2 more sources

Design of a mobile Web crawler for hidden Web

2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016
The World Wide Web (WWW) is a diverse source of information. A large part of the Web is hidden behind search forms and is reachable only when a user types in a set of keywords or queries. This part of Web is known as hidden Web or deep Web. The webpages in the hidden Web are not accessible by following hyperlinks and hence are not indexed by the search
Manish Kumar, Rajesh Bhatia
openaire   +2 more sources

Web crawler research methodology [PDF]

open access: possible, 2011
In economic and social sciences it is crucial to test theoretical models against reliable and big enough databases. The general research challenge is to build up a well-structured database that suits well to the given research question and that is cost efficient at the same time. In this paper we focus on crawler programs that proved to be an effective
Nemeslaki, András   +1 more
openaire   +1 more source

NoSQL Web Crawler Application

2018
Abstract With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for
openaire   +2 more sources

A web crawler design for data mining

Journal of Information Science, 2001
The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed.
openaire   +1 more source

Home - About - Disclaimer - Privacy