Multiple-Goal Heuristic Search

This paper presents a new framework for anytime heuristic search where the task is to achieve as many goals as possible within the allocated resources. We show the inadequacy of traditional distance-estimation heuristics for tasks of this type and present alternative heuristics that are more appropriate for multiple-goal search. In particular, we introduce the marginal-utility heuristic, which estimates the cost and the benefit of exploring a subtree below a search node. We developed two methods for online learning of the marginal-utility heuristic. One is based on local similarity of the partial marginal utility of sibling nodes, and the other generalizes marginal-utility over the state feature space. We apply our adaptive and non-adaptive multiple-goal search algorithms to several problems, including focused crawling, and show their superiority over existing methods.

[1]  Sriram Raghavan,et al.  WebBase: a repository of Web pages , 2000, Comput. Networks.

[2]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[3]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .

[4]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[5]  François Charpillet,et al.  Real-Time Problem-Solving with Contract Algorithms , 1999, IJCAI.

[6]  Eric A. Hansen,et al.  Multiple sequence alignment using anytime A* , 2002, AAAI/IAAI.

[7]  S. Schroedl An Improved Search Algorithm for Optimal Multiple-Sequence Alignment , 2005, J. Artif. Intell. Res..

[8]  Allan Borodin,et al.  Finding authorities and hubs from link structures on the World Wide Web , 2001, WWW '01.

[9]  Andrei Z. Broder,et al.  A Technique for Measuring the Relative Size and Overlap of Public Web Search Engines , 1998, Comput. Networks.

[10]  Shlomo Zilberstein,et al.  Monitoring the Progress of Anytime Problem-Solving , 1996, AAAI/IAAI, Vol. 2.

[11]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, Internet Math..

[12]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[13]  Richard E. Korf,et al.  Disjoint pattern database heuristics , 2002, Artif. Intell..

[14]  Anja Feldmann,et al.  Rate of Change and other Metrics: a Live Study of the World Wide Web , 1997, USENIX Symposium on Internet Technologies and Systems.

[15]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[16]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[17]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[18]  Judea Pearl,et al.  Studies in Semi-Admissible Heuristics , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Dayne Freitag,et al.  A Machine Learning Architecture for Optimizing Web Search Engines , 1999 .

[20]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[21]  Ralph Udo Gasser,et al.  Harnessing computational resources for efficient exhaustive search , 1995 .

[22]  Shlomo Zilberstein,et al.  Composing Real-Time Systems , 1991, IJCAI.

[23]  Richard E. Korf,et al.  Divide-and-Conquer Frontier Search Applied to Optimal Sequence Alignment , 2000, AAAI/IAAI.

[24]  Armand Prieditis Machine discovery of effective admissible heuristics , 2004, Machine Learning.

[25]  Marc Najork,et al.  Breadth-first crawling yields high-quality pages , 2001, WWW '01.

[26]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[27]  Eric A. Hansen,et al.  Sweep A: space-efficient heuristic search in partially ordered graphs , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Eric Joel Hovitz Computation and action under bounded resources , 1991 .

[30]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.

[31]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[32]  Jack Mostow,et al.  Discovering Admissible Heuristics by Abstracting and Optimizing: A Transformational Approach , 1989, IJCAI.

[33]  Shlomo Zilberstein,et al.  Using Anytime Algorithms in Intelligent Systems , 1996, AI Mag..

[34]  Teruhisa Miura,et al.  A* with Partial Expansion for Large Branching Factor Problems , 2000, AAAI/IAAI.

[35]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[36]  Andrew McCallum,et al.  Using Reinforcement Learning to Spider the Web Efficiently , 1999, ICML.

[37]  Paul E. Utgoff,et al.  ID5: An Incremental ID3 , 1987, ML.

[38]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[39]  Alan M. Frieze,et al.  Crawling on web graphs , 2002, STOC '02.

[40]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[41]  Richard M. Karp,et al.  The Traveling-Salesman Problem and Minimum Spanning Trees , 1970, Oper. Res..

[42]  Mark S. Boddy,et al.  Deliberation Scheduling for Problem Solving in Time-Constrained Environments , 1994, Artif. Intell..

[43]  Larry S. Davis,et al.  Pattern Databases , 1979, Data Base Design Techniques II.