Crawlets: Agents for High Performance Web Search Engines

Some of the reasons for unsatisfactory performance of today's search engines are their centralized approach to web crawling and lack of explicit support from web servers. We propose a modification to conventional crawling in which a search engine uploads simple agents, called crawlets, to web sites. A crawlet crawls pages at a site locally and sends a compact summary back to the search engine. This not only reduces bandwidth requirements and network latencies, but also parallelizes crawling. Crawlets also provide an effective means for achieving the performance gains of personalized web servers, and can make up for the lack of cooperation from conventional web servers. The specialized nature of crawlets allows simple solutions to security and resource control problems, and reduces software requirements at participating web sites. In fact, we propose an implementation that requires no changes to web servers, but only the installation of a few (active) web pages at host sites.

[1]  Giovanni Vigna,et al.  Protecting Mobile Agents through Tracing , 1997 .

[2]  George Cybenko,et al.  Keeping up with the changing Web , 2000, Computer.

[3]  Christian F. Tschudin,et al.  Protecting Mobile Agents Against Malicious Hosts , 1998, Mobile Agents and Security.

[4]  Hector Garcia-Molina,et al.  Crawler-Friendly Web Servers , 2000, PERV.

[5]  William M. Farmer,et al.  Security for Mobile Agents: Authentication and State Appraisal , 1996, ESORICS.

[6]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[7]  C. Lee Giles,et al.  Accessibility of information on the Web , 2000, INTL.

[8]  Joachim Hammer,et al.  Using mobile crawlers to search the Web efficiently , 2000, ACIS Int. J. Comput. Inf. Sci..

[9]  Roy H. Campbell,et al.  Internet search engine freshness by Web server help , 2001, Proceedings 2001 Symposium on Applications and the Internet.

[10]  George Cybenko,et al.  Observation of changing information sources , 2000 .

[11]  Peter B. Danzig,et al.  The Harvest Information Discovery and Access System , 1995, Comput. Networks ISDN Syst..

[12]  Vijay Sureshkumar Java security , 1998 .

[13]  D. B. Davis,et al.  Sun Microsystems Inc. , 1993 .

[14]  Scott Oaks,et al.  Java Security , 1998 .

[15]  S. Funfrocken,et al.  How to integrate mobile agents into Web servers , 1997, Proceedings of IEEE 6th Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises.

[16]  David Kotz,et al.  Autonomous and Adaptive Agents that Gather Information , 1996 .

[17]  J. C. Byington,et al.  Mobile agents and security , 1998, IEEE Commun. Mag..