Adaptive Retrieval Agents: Internalizing Local Context and Scaling up to the Web

This paper discusses a novel distributed adaptive algorithm and representation used to construct populations of adaptive Web agents. These InfoSpiders browse networked information environments on-line in search of pages relevant to the user, by traversing hyperlinks in an autonomous and intelligent fashion. Each agent adapts to the spatial and temporal regularities of its local context thanks to a combination of machine learning techniques inspired by ecological models: evolutionary adaptation with local selection, reinforcement learning and selective query expansion by internalization of environmental signals, and optional relevance feedback. We evaluate the feasibility and performance of these methods in three domains: a general class of artificial graph environments, a controlled subset of the Web, and (preliminarly) the full Web. Our results suggest that InfoSpiders could take advantage of the starting points provided by search engines, based on global word statistics, and then use linkage topology to guide their search on-line. We show how this approach can complement the current state of the art, especially with respect to the scalability challenge.

[1]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[2]  Ray R. Larson,et al.  Bibliometrics of the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace , 1996 .

[3]  Bill Broyles Notes , 1907, The Classical Review.

[4]  Georges R. Harik,et al.  Finding Multimodal Solutions Using Restricted Tournament Selection , 1995, ICGA.

[5]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[6]  Filippo Menczer,et al.  Life-like agents: internalizing local cues for reinforcement learning and evolution , 1998 .

[7]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[8]  Oren Etzioni,et al.  Dynamic Reference Sifting: A Case Study in the Homepage Domain , 1997, Comput. Networks.

[9]  W. Bruce Croft,et al.  Retrieval Strategies for Hypertext , 1993, Inf. Process. Manag..

[10]  Dr P M E De,et al.  Information Retrieval in the World − Wide Web : Making Client − based searching feasible , 2022 .

[11]  Richard K. Belew,et al.  Exporting phrases: a statistical analysis of topical language , 1991 .

[12]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[13]  Amy M. Steier Statistical semantics of phrases in hierarchical contexts , 1994 .

[14]  Daniela Rus,et al.  Digital Digital Transportable Information Agents Transportable Information Agents , 1996 .

[15]  B. Pinkerton,et al.  Finding What People Want : Experiences with the WebCrawler , 1994, WWW Spring 1994.

[16]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[17]  William E. Hart,et al.  Optimization with genetic algorithm hybrids that use local searches , 1996 .

[18]  Filippo Menczer,et al.  From Complex Environments to Complex Behaviors , 1996, Adapt. Behav..

[19]  David D. Lewis,et al.  Challenges in machine learning for text classification , 1996, COLT '96.

[20]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[21]  Giles,et al.  Searching the world wide Web , 1998, Science.

[22]  T. J. Bergendahl,et al.  DIGITAL EQUIPMENT CORPORATION. , 1968, Analytical chemistry.

[23]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[24]  Giorgos Zacharia,et al.  Evolving a multi-agent information filtering solution in Amalthaea , 1997, AGENTS '97.

[25]  Joseph Pasquale,et al.  The UCSD Active Web , 1997 .

[26]  B. Huberman,et al.  Surfing as a real option , 1998, ICE '98.

[27]  Matthias Klusch,et al.  Intelligent Information Agents: Agent-Based Information Discovery and Management on the Internet , 1999 .

[28]  Filippo Menczer,et al.  Adaptive information agents in distributed textual environments , 1998, AGENTS '98.

[29]  Filippo Menczer,et al.  An Endogenous Fitness Paradigm for Adaptive Information Agents , 1994 .

[30]  Donna K. Harman,et al.  Relevance Feedback and Other Query Modification Techniques , 1992, Information retrieval (Boston).

[31]  Alberto O. Mendelzon,et al.  Applications of a Web Query Language , 1997, Comput. Networks.

[32]  William W. Cohen Learning Trees and Rules with Set-Valued Features , 1996, AAAI/IAAI, Vol. 1.

[33]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[34]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[35]  Filippo Menczer,et al.  Latent energy environments , 1996 .

[36]  Samir W. Mahfoud A Comparison of Parallel and Sequential Niching Methods , 1995, ICGA.

[37]  Kenneth A. De Jong,et al.  On Decentralizing Selection Algorithms , 1995, ICGA.

[38]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[39]  W. S. Cooper Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systems , 1968 .

[40]  Christopher J. Fox,et al.  Lexical Analysis and Stoplists , 1992, Information Retrieval: Data Structures & Algorithms.

[41]  Karen Spärck Jones Experiments in relevance weighting of search terms , 1979, Inf. Process. Manag..

[42]  Ellen Spertus,et al.  ParaSite: Mining Structural Information on the Web , 1997, Comput. Networks.

[43]  David E. Goldberg,et al.  Genetic Algorithms with Sharing for Multimodalfunction Optimization , 1987, ICGA.

[44]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[45]  Filippo Menczer,et al.  Scalable Web Search by Adaptive Online Agents: An InfoSpiders Case Study , 1999 .

[46]  Filippo Menczer,et al.  ARCCHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods , 1997, ICML.

[47]  Filippo Menczer,et al.  Local Selection , 1998, Evolutionary Programming.

[48]  Jacques Savoy,et al.  An Extended Vector-Processing Scheme for Searching Information in Hypertext Systems , 1996, Inf. Process. Manag..

[49]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[50]  Ben Shneiderman,et al.  Navigating in hyperspace: designing a structure-based toolbox , 1994, CACM.

[51]  Peter B. Danzig,et al.  Scalable Internet resource discovery: research problems and approaches , 1994, CACM.

[52]  Kui-Lam Kwok,et al.  On the use of bibliographically related titles for the enhancement of document representations , 1988, Inf. Process. Manag..

[53]  David Eichmann,et al.  The RBSE spider — Balancing effective search against Web load , 1994, WWW Spring 1994.

[54]  Marko Balabanovic,et al.  An adaptive Web page recommendation service , 1997, AGENTS '97.

[55]  Filippo Menczer,et al.  ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery , 1997, ICML 1997.

[56]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[57]  Gerard Salton,et al.  Associative Document Retrieval Techniques Using Bibliographic Information , 1963, JACM.

[58]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[59]  Henry Lieberman,et al.  Autonomous interface agents , 1997, CHI.

[60]  C. Lee Giles,et al.  CiteSeer: an autonomous Web agent for automatic retrieval and identification of interesting publications , 1998, AGENTS '98.

[61]  David Lewis,et al.  Information Retrieval and the Statistics of Large Data Sets , 1996 .

[62]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[63]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[64]  Philip J. Hayes,et al.  Guest Editorial - Special Issue on Text Categorization , 1994, ACM Trans. Inf. Syst..

[65]  David Lewis,et al.  Active by Accident: Relevance Feedback in Information Retrieval , 1995 .

[66]  Mihalis Yannakakis,et al.  Searching a Fixed Graph , 1996, ICALP.

[67]  Huberman,et al.  Strong regularities in world wide web surfing , 1998, Science.