Improving the efficiency of online POMDPs by using belief similarity measures

In this paper, we introduce an approach called FSBS (Forward Search in Belief Space) for online planning in POMDPs. The approach is based on the RTBSS (Real-Time Belief Space Search) algorithm of [1]. The main departure from the algorithm is the introduction of similarity measures in the belief space. By considering statistical divergence measures, the similarity between belief points in the forward search tree can be computed. Therefore, it is possible to determine if a certain belief point (or one very similar) has been already visited. This way, it is possible to reduce the complexity of the search by not expanding similar nodes already visited in the same depth. This reduction of complexity makes possible the real-time implementation of more complex problems in robots. The paper describes the algorithm, and analyzes different divergence measures. Benchmark problems are used to show how the approach can obtain a ten-fold reduction in the computation time for similar obtained rewards when compared to the original RTBSS. The paper also presents experiments with a quadrotor in a search application.

[1]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[2]  Trey Smith,et al.  Probabilistic planning for robotic exploration , 2007 .

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Stéphane Ross,et al.  Hybrid POMDP Algorithms , 2006 .

[5]  David Hsu,et al.  SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[6]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[7]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[8]  Nicholas Roy,et al.  Efficient planning under uncertainty for a target-tracking micro-aerial vehicle , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Yishay Mansour,et al.  Multiple Source Adaptation and the Rényi Divergence , 2009, UAI.

[10]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[11]  Chulhee Lee,et al.  Feature extraction based on the Bhattacharyya distance , 2003, Pattern Recognit..

[12]  David Hsu,et al.  POMDPs for robotic tasks with mixed observability , 2009, Robotics: Science and Systems.

[13]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[14]  Blai Bonet,et al.  Solving POMDPs: RTDP-Bel vs. Point-based Algorithms , 2009, IJCAI.

[15]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[16]  José Martínez-Aroza,et al.  An Analysis of Edge Detection by Using the Jensen-Shannon Divergence , 2000, Journal of Mathematical Imaging and Vision.

[17]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[18]  Aníbal Ollero,et al.  Decentralized multi-robot cooperation with auctioned POMDPs , 2012, 2012 IEEE International Conference on Robotics and Automation.

[19]  Ann Gordon-Ross,et al.  Online algorithms for wireless sensor networks dynamic optimization , 2012, 2012 IEEE Consumer Communications and Networking Conference (CCNC).

[20]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[21]  Nicholas Roy,et al.  Efficient Planning under Uncertainty with Macro-actions , 2014, J. Artif. Intell. Res..

[22]  Minoo Aminian,et al.  Active learning for reducing bias and variance of a classifier using Jensen-Shannon divergence , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).