Efficient Goal-Directed Exploration

If a state space is not completely known in advance, then search algorithms have to explore it sufficiently to locate a goal state and a path leading to it, performing therefore what we call goal-directed exploration. Two paradigms of this process are pure exploration and heuristic-driven exploitation: the former approaches explore the state space using only knowledge of the physically visited portion of the domain, whereas the latter approaches totally rely on heuristic knowledge to guide the search towards goal states. Both approaches have disadvantages: the first one does not utilize available knowledge to cut down the search effort, and the second one relies too much on the knowledge, even if it is misleading. We have therefore developed a framework for goal-directed exploration, called VECA, that combines the advantages of both approaches by automatically switching from exploitation to exploration on parts of the state space where exploitation does not perform well. VECA provides better performance guarantees than previously studied heuristic-driven exploitation algorithms, and experimental evidence suggests that this guarantee does not deteriorate its average-case performance.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Mona Singh,et al.  Piecemeal learning of an unknown environment , 1995, Mach. Learn..

[3]  Shay Kutten,et al.  A modular technique for the design of efficient distributed leader finding algorithms , 1990, TOPL.

[4]  Baruch Schieber,et al.  Navigating in unfamiliar geometric terrain , 1991, STOC '91.

[5]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[6]  S. Sitharama Iyengar,et al.  A 'retraction' method for learned navigation in unknown terrains for a circular robot , 1991, IEEE Trans. Robotics Autom..

[7]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[8]  Nageswara S. V. Rao,et al.  Robot navigation in an unexplored terrain , 1986, J. Field Robotics.

[9]  Sebastian Thrun,et al.  Efficient Exploration In Reinforcement Learning , 1992 .

[10]  Xiaotie Deng,et al.  How to learn an unknown environment , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[11]  Vladimir J. Lumelsky,et al.  Dynamic path planning in sensor-based terrain acquisition , 1990, IEEE Trans. Robotics Autom..

[12]  Gregory D. Benson,et al.  Learning continuous-space navigation heuristics in real time , 1993 .

[13]  R. Korf,et al.  Incremental path planning on graphs with cycles , 1992 .

[14]  Anthony Stentz,et al.  The Focussed D* Algorithm for Real-Time Replanning , 1995, IJCAI.

[15]  C. Hierholzer,et al.  Ueber die Möglichkeit, einen Linienzug ohne Wiederholung und ohne Unterbrechung zu umfahren , 1873 .

[16]  Sven Koenig,et al.  Graph learning with a nearest neighbor approach , 1996, COLT '96.

[17]  Xiaotie Deng,et al.  Exploring an unknown graph , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.