Results 221 to 230 of about 2,674 (265)
Some of the next articles are maybe not open access.
The Ethicality of Web Crawlers
2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010Search engines largely rely on web crawlers to collect information from the web. This has led to an enormous amount of web traffic generated by crawlers alone. To minimize negative aspects of this traffic on websites, the behaviors of crawlers may be regulated at an individual web server by implementing the Robots Exclusion Protocol in a file called ...
C. Lee Giles+2 more
openaire +2 more sources
Reducing web crawler overhead using mobile crawler
2011 International Conference on Emerging Trends in Electrical and Computer Technology, 2011To search any information on the web users extensively use the search engines. As the growth of the World Wide Web exceeded all expectations, the search engines rely on web crawlers to maintain the index of billions of pages for efficient searching. The web crawlers have to interact with millions of hosts and retrieve the pages continuously to keep the
K Muthu Manickam, S Anbukodi
openaire +2 more sources
CoBWeb-a crawler for the Brazilian Web
6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268), 2003One of the key components of current Web search engines is the document collector. The paper describes CoBWeb, an automatic document collector whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period while observing operational and ethical limits in the crawling process.
Paulo B. Golgher+5 more
openaire +2 more sources
2020
In this chapter, we will discuss a crawling framework called Scrapy and go through the steps necessary to crawl and upload the web crawl data to an S3 bucket.
openaire +2 more sources
In this chapter, we will discuss a crawling framework called Scrapy and go through the steps necessary to crawl and upload the web crawl data to an S3 bucket.
openaire +2 more sources
International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004., 2004
The requirement of a Web crawler that downloads most relevant pages is still a major challenge in the field of information retrieval systems. The use of link analysis algorithms like page rank and other importance-metrics have shed a new approach in prioritizing the URL queue for downloading higher relevant pages. The combination of these metrics along
SrinivasaMurthy+4 more
openaire +2 more sources
The requirement of a Web crawler that downloads most relevant pages is still a major challenge in the field of information retrieval systems. The use of link analysis algorithms like page rank and other importance-metrics have shed a new approach in prioritizing the URL queue for downloading higher relevant pages. The combination of these metrics along
SrinivasaMurthy+4 more
openaire +2 more sources
Design of the Distributed Web Crawler
Advanced Materials Research, 2011On the current scale of the Internet, the single web crawler is unable to visit the entire web in an effective time-frame. So, we develop a distributed web crawler system to deal with it. In our distribution design, we mainly consider two facets of parallel.
Wei Jiang Li+3 more
openaire +2 more sources
ACM SIGMETRICS Performance Evaluation Review, 2000
In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant ...
Junghoo Cho+3 more
openaire +2 more sources
In this paper we study how to make web servers (e.g., Apache) more crawler friendly. Current web servers offer the same interface to crawlers and regular web surfers, even though crawlers and surfers have very different performance requirements. We evaluate simple and easy-to-incorporate modifications to web servers so that there are significant ...
Junghoo Cho+3 more
openaire +2 more sources
An Ontology-Based Crawler for the Semantic Web
2008We present work in progress on automated and ontology-guided discovery, extraction and mapping of information sources on the Semantic Web. It concerns an ontology-guided focused crawler to discover and match different data sources. We have developed an automated ontology-matcher embedded in the crawler that relates semantic web documents found during ...
Van De Maele, Felix+2 more
openaire +3 more sources
2016 International Conference on Information Communication and Embedded Systems (ICICES), 2016
Centralized crawlers are not adequate to spider meaningful and relevant portions of the Web. A crawler with good scalability and load balancing can bring growth to performance. As the size of web is growing, in order to complete the downloading of pages in fewer amounts of time and increase the coverage of crawlers it is necessary to distribute the ...
Sawroop Kaur Bal, G. Geetha
openaire +2 more sources
Centralized crawlers are not adequate to spider meaningful and relevant portions of the Web. A crawler with good scalability and load balancing can bring growth to performance. As the size of web is growing, in order to complete the downloading of pages in fewer amounts of time and increase the coverage of crawlers it is necessary to distribute the ...
Sawroop Kaur Bal, G. Geetha
openaire +2 more sources
Agents, Crawlers, and Web Retrieval
2002In this paper we survey crawlers, a specific type of agents used by search engines. We also explore the relation with generic agents and how agent technology or variants of it could help to develop search engines that are more effective, efficient, and scalable.
Ricardo Baeza-Yates, José M. Piquer
openaire +2 more sources