Results 291 to 300 of about 23,388 (326)
Some of the next articles are maybe not open access.

Agents, Crawlers, and Web Retrieval

2002
In this paper we survey crawlers, a specific type of agents used by search engines. We also explore the relation with generic agents and how agent technology or variants of it could help to develop search engines that are more effective, efficient, and scalable.
Ricardo Baeza-Yates, José M. Piquer
openaire   +2 more sources

Smart distributed web crawler

2016 International Conference on Information Communication and Embedded Systems (ICICES), 2016
Centralized crawlers are not adequate to spider meaningful and relevant portions of the Web. A crawler with good scalability and load balancing can bring growth to performance. As the size of web is growing, in order to complete the downloading of pages in fewer amounts of time and increase the coverage of crawlers it is necessary to distribute the ...
Sawroop Kaur Bal, G. Geetha
openaire   +2 more sources

The Improved Pagerank in Web Crawler

2009 First International Conference on Information Science and Engineering, 2009
Pagerank is an algorithm for rating web pages. It introduces the relationship of citation in academic papers to evaluate the web page's authority. It gives the same weight to all edges and ignores the relevancy of web pages to the topic, resulting in a problem of topic-drift.
Qin Zheng, Zhang Ling
openaire   +2 more sources

Measuring the web crawler ethics

Proceedings of the 19th international conference on World wide web, 2010
Web crawlers are highly automated and seldom regulated manually. The diversity of crawler activities often leads to ethical problems such as spam and service attacks. In this research, quantitative models are proposed to measure the web crawler ethics based on their behaviors on web servers.
C. Lee Giles   +2 more
openaire   +1 more source

A framework of deep Web crawler

2008 27th Chinese Control Conference, 2008
As an ever-increasing amount of information on the Web today is available through search interfaces, users have to key in a set of keywords in order to access the pages from certain Web sites, which are often referred to as the hidden Web or the deep Web.
Xiang Peisu, Tian Ke, Huang Qin-zhen
openaire   +2 more sources

Cross-supervised synthesis of web-crawlers

Proceedings of the 38th International Conference on Software Engineering, 2016
A web-crawler is a program that automatically and systematically tracks the links of a website and extracts information from its pages. Due to the different formats of websites, the crawling scheme for different sites can differ dramatically. Manually customizing a crawler for each specific site is time consuming and error-prone.
Adi Omari, Sharon Shoham, Eran Yahav
openaire   +2 more sources

A Novel Architecture for Deep Web Crawler

International Journal of Information Technology and Web Engineering, 2011
A traditional crawler picks up a URL, retrieves the corresponding page and extracts various links, adding them to the queue. A deep Web crawler, after adding links to the queue, checks for forms. If forms are present, it processes them and retrieves the required information.
Dilip Kumar Sharma, A. K. Sharma
openaire   +2 more sources

Design of a mobile Web crawler for hidden Web

2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016
The World Wide Web (WWW) is a diverse source of information. A large part of the Web is hidden behind search forms and is reachable only when a user types in a set of keywords or queries. This part of Web is known as hidden Web or deep Web. The webpages in the hidden Web are not accessible by following hyperlinks and hence are not indexed by the search
Manish Kumar, Rajesh Bhatia
openaire   +2 more sources

Real-time web crawler detection

2011 18th International Conference on Telecommunications, 2011
In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers.
Balla, A.   +5 more
openaire   +3 more sources

NoSQL Web Crawler Application

2018
Abstract With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for
openaire   +2 more sources

Home - About - Disclaimer - Privacy