Results 261 to 270 of about 3,180 (309)
Some of the next articles are maybe not open access.
LEARNING-based Focused WEB Crawler
IETE Journal of Research, 2021As the number of pages being published every day increases enormously, there is a consistent need to design an efficient crawler mechanism that can result in appropriate and efficient search result...
Naresh Kumar, Dhruv Aggarwal
openaire +1 more source
2016 International Conference on Information Communication and Embedded Systems (ICICES), 2016
Centralized crawlers are not adequate to spider meaningful and relevant portions of the Web. A crawler with good scalability and load balancing can bring growth to performance. As the size of web is growing, in order to complete the downloading of pages in fewer amounts of time and increase the coverage of crawlers it is necessary to distribute the ...
Sawroop Kaur Bal, G. Geetha
openaire +1 more source
Centralized crawlers are not adequate to spider meaningful and relevant portions of the Web. A crawler with good scalability and load balancing can bring growth to performance. As the size of web is growing, in order to complete the downloading of pages in fewer amounts of time and increase the coverage of crawlers it is necessary to distribute the ...
Sawroop Kaur Bal, G. Geetha
openaire +1 more source
2020
In this chapter, we will discuss a crawling framework called Scrapy and go through the steps necessary to crawl and upload the web crawl data to an S3 bucket.
openaire +1 more source
In this chapter, we will discuss a crawling framework called Scrapy and go through the steps necessary to crawl and upload the web crawl data to an S3 bucket.
openaire +1 more source
2009 Second International Conference on Emerging Trends in Engineering & Technology, 2009
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Ironically the very size of this collection has become an obstacle for information retrieval. The user has to shift through scores of pages to come upon the information he/she desires. Web crawlers are the heart of search engines.
Pooja Gupta, Kalpana Johari
openaire +1 more source
The World Wide Web is an interlinked collection of billions of documents formatted using HTML. Ironically the very size of this collection has become an obstacle for information retrieval. The user has to shift through scores of pages to come upon the information he/she desires. Web crawlers are the heart of search engines.
Pooja Gupta, Kalpana Johari
openaire +1 more source
Web Crawler for Searching Deep Web Sites
2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), 2017In World Wide Web deep web searching is a most important issue till date. Searching relevant information on a web require different techniques. Crawler is a technique which will help to find out relevant information on web. Nowaday, humans are searching data with the help of search engines such as Google and Yahoo but these search engines will not ...
Tejaswini Arun Patil, Santosh Chobe
openaire +1 more source
Web crawler research methodology [PDF]
In economic and social sciences it is crucial to test theoretical models against reliable and big enough databases. The general research challenge is to build up a well-structured database that suits well to the given research question and that is cost efficient at the same time. In this paper we focus on crawler programs that proved to be an effective
Nemeslaki, András +1 more
openaire +1 more source
Evaluating topic-driven web crawlers
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001Due to limited bandwidth, storage, and computational resources, and to the dynamic nature of the Web, search engines cannot index every Web page, and even the covered portion of the Web cannot be monitored continuously for changes. Therefore it is essential to develop effective crawling strategies to prioritize the pages to be indexed.
Filippo Menczer +3 more
openaire +1 more source
Real-time web crawler detection
2011 18th International Conference on Telecommunications, 2011In this paper we present a methodology for detecting web crawlers in real time. We use decision trees to classify requests in real time, as originating from a crawler or human, while their session is ongoing. For this purpose we used machine learning techniques to identify the most important features that differentiate humans from crawlers.
Balla, A. +5 more
openaire +2 more sources
Learnable topic-specific web crawler
Journal of Network and Computer Applications, 2005Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages as possible, and most of them only detail the approaches of the first crawling.
A. Rungsawang, N. Angkawattanawit
openaire +1 more source

