Results 51 to 60 of about 76,818 (217)
A Focused Crawler Combinatory Link and Content Model Based on T-Graph Principles [PDF]
The two significant tasks of a focused Web crawler are finding relevant topic-specific documents on the Web and analytically prioritizing them for later effective and reliable download. For the first task, we propose a sophisticated custom algorithm to fetch and analyze the most effective HTML structural elements of the page as well as the topical ...
arxiv +1 more source
Stasis domains and slip surfaces in the locomotion of a bio-inspired two-segment crawler [PDF]
We formulate and solve the locomotion problem for a bio-inspired crawler consisting of two active elastic segments (i.e., capable of changing their rest lengths), resting on three supports providing directional frictional interactions. The problem consists in finding the motion produced by a given, slow actuation history. By focusing on the tensions in
arxiv +1 more source
Profile-Based Focused Crawling for Social Media-Sharing Websites
We present a novel profile-based focused crawling system for dealing with the increasingly popular social media-sharing websites. In this system, we treat the user profiles as ranking criteria for guiding the crawling process.
Zhiyong Zhang, Olfa Nasraoui
doaj +2 more sources
The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the
Vassilis Plachouras+7 more
doaj +1 more source
Keyword query based focused Web crawler
Abstract Finding information on Web is a difficult and challenging task because of the extremely large volume of data. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages present on Web. This paper proposes a query based crawler where a set of keywords relevant to the topic of interest of the user is
Manish Kumar+3 more
openalex +3 more sources
WebCollectives: A light regular expression based web content extractor in Java
Conventional web crawling methods typically involve a sequence of distinct steps for downloading and extracting web content. A noteworthy limitation of these conventional crawling approaches is their lack of a focus-based crawling strategy.
Hayri Volkan Agun
doaj
Look back, look around: a systematic analysis of effective predictors for new outlinks in focused Web crawling [PDF]
Small and medium enterprises rely on detailed Web analytics to be informed about their market and competition. Focused crawlers meet this demand by crawling and indexing specific parts of the Web. Critically, a focused crawler must quickly find new pages that have not yet been indexed. Since a new page can be discovered only by following a new outlink,
arxiv
The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification [PDF]
Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection.
arxiv +1 more source
Using recurrent neural networks and web crawlers to scrape open data from the Internet. [PDF]
Web crawling techniques in conjunction with Recurrent Neural Networks (RNNs) have been applied to several areas in the field of data mining on the Internet, but how they would best be applied to searching for open datasets has not yet been studied.
Kanneganti, D.
doaj
Emotional Effect of Cherry Blossoms in Wuhan during the COVID-19 Epidemic
The COVID-19 interrupted the lives of Wuhan residents. This study attempted to understand the psychological loss and emotional changes in urban residents in this unusual period by exploring the relationship between city, residents, and landscape.
Xing Tenghui, Wang Xiaofeng
doaj +1 more source