Decision-theoretic planning under uncertainty with information rewards for active cooperative perception

Partially observable Markov decision processes (POMDPs) provide a principled framework for modeling an agent’s decision-making problem when the agent needs to consider noisy state estimates. POMDP policies take into account an action’s influence on the environment as well as the potential information gain. This is a crucial feature for robotic agents which generally have to consider the effect of actions on sensing. However, building POMDP models which reward information gain directly is not straightforward, but is important in domains such as robot-assisted surveillance in which the value of information is hard to quantify. Common techniques for uncertainty reduction such as expected entropy minimization lead to non-standard POMDPs that are hard to solve. We present the POMDP with Information Rewards (POMDP-IR) modeling framework, which rewards an agent for reaching a certain level of belief regarding a state feature. By remaining in the standard POMDP setting we can exploit many known results as well as successful approximate algorithms. We demonstrate our ideas in a toy problem as well as in real robot-assisted surveillance, showcasing their use for active cooperative perception scenarios. Finally, our experiments show that the POMDP-IR framework compares favorably with a related approach on benchmark domains.

[1]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[2]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[3]  Frans A. Oliehoek,et al.  Tree-based pruning for multiagent POMDPs with delayed communication , 2012, AAMAS.

[4]  Wolfram Burgard,et al.  Active mobile robot localization by entropy minimization , 1997, Proceedings Second EUROMICRO Workshop on Advanced Mobile Robots.

[5]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[7]  Ronald E. Parr,et al.  Non-Myopic Multi-Aspect Sensing with Partially Observable Markov Decision Processes , 2005 .

[8]  Mohan Sridharan,et al.  Active visual sensing and collaboration on mobile robots using hierarchical POMDPs , 2012, AAMAS.

[9]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[10]  Pascal Poupart,et al.  Point-Based Value Iteration for Continuous POMDPs , 2006, J. Mach. Learn. Res..

[11]  Sebastian Thrun,et al.  Planning under Uncertainty for Reliable Health Care Robotics , 2003, FSR.

[12]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[13]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[14]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[15]  O. Buffet,et al.  A POMDP Extension with Belief-dependent Rewards (Extended Version) , 2010 .

[16]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[17]  Nan Rong,et al.  A point-based POMDP planner for target tracking , 2008, 2008 IEEE International Conference on Robotics and Automation.

[18]  Pedro U. Lima,et al.  Active cooperative perception in network robot systems using POMDPs , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Leen-Kiat Soh,et al.  Evaluating POMDP rewards for active perception , 2012, AAMAS.

[20]  Wolfram Burgard,et al.  Coastal navigation-mobile robot navigation with uncertainty in dynamic environments , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[21]  Leslie Pack Kaelbling,et al.  Continuous-State POMDPs with Hybrid Dynamics , 2008, ISAIM.

[22]  Seth Hutchinson,et al.  Minimum uncertainty robot navigation using information-guided POMDP planning , 2011, 2011 IEEE International Conference on Robotics and Automation.

[23]  Joelle Pineau,et al.  Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[24]  Lawrence Carin,et al.  Nonmyopic Multiaspect Sensing With Partially Observable Markov Decision Processes , 2007, IEEE Transactions on Signal Processing.

[25]  Olivier Buffet,et al.  A POMDP Extension with Belief-dependent Rewards , 2010, NIPS.

[26]  Robin J. Evans,et al.  Simulation-Based Optimal Sensor Scheduling with Application to Observer Trajectory Planning , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.

[27]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[28]  Jay H. Lee,et al.  MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces , 2013, Comput. Chem. Eng..

[29]  Pedro U. Lima,et al.  Point-Based POMDP Solving with Factored Value Function Approximation , 2014, AAAI.

[30]  Alberto Sanfeliu,et al.  Decentralized Sensor Fusion for Ubiquitous Networking Robotics in Urban Areas , 2010, Sensors.

[31]  Andreas Krause,et al.  Efficient Sensor Placement Optimization for Securing Large Water Distribution Networks , 2008 .

[32]  Sriraam Natarajan,et al.  A Decision-Theoretic Model of Assistance , 2007, IJCAI.

[33]  Nicholas Roy,et al.  The permutable POMDP: fast solutions to POMDPs for preference elicitation , 2008, AAMAS.

[34]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[35]  Wolfram Burgard,et al.  Information Gain-based Exploration Using Rao-Blackwellized Particle Filters , 2005, Robotics: Science and Systems.

[36]  Albert S. Huang,et al.  Planning to Perceive: Exploiting Mobility for Robust Object Detection , 2011, ICAPS.

[37]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[38]  Mohan S. Kankanhalli,et al.  Decision-theoretic approach to maximizing observation of multiple targets in multi-camera surveillance , 2012, AAMAS.

[39]  Mauricio Araya-López,et al.  Near-Optimal Algorithms for Sequential Information-Gathering Decision Problems. (Des algorithmes presque optimaux pour les problèmes de décision séquentielle à des fins de collecte d'information) , 2013 .

[40]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[41]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[42]  Luis Merino,et al.  Robust Person Guidance by Using Online POMDPs , 2013, ROBOT.

[43]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[44]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[45]  Pedro U. Lima,et al.  A Decision-Theoretic Approach to Dynamic Sensor Selection in Camera Networks , 2009, ICAPS.

[46]  Joris De Schutter,et al.  A Comparison of Decision Making Criteria and Optimization Methods for Active Robotic Sensing , 2002, Numerical Methods and Application.

[47]  Jonathan P. How,et al.  Planning for decentralized control of multiple robots under uncertainty , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Mathijs de Weerdt,et al.  Planning under Uncertainty for Coordinating Infrastructural Maintenance , 2013, ICAPS.

[49]  M. Spaan Cooperative Active Perception using POMDPs , 2008 .

[50]  Aníbal Ollero,et al.  Decentralized multi-robot cooperation with auctioned POMDPs , 2012, 2012 IEEE International Conference on Robotics and Automation.

[51]  Vikram Krishnamurthy,et al.  Structured Threshold Policies for Dynamic Sensor Scheduling—A Partially Observed Markov Decision Process Approach , 2007, IEEE Transactions on Signal Processing.

[52]  Andreas Krause,et al.  Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.

[53]  AnYuan Guo,et al.  Decision-theoretic active sensing for autonomous agents , 2003, AAMAS '03.

[54]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[55]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[56]  Matthijs T. J. Spaan,et al.  Partially Observable Markov Decision Processes , 2010, Encyclopedia of Machine Learning.

[57]  Deb Roy,et al.  Connecting language to the world , 2005, Artif. Intell..

[58]  Nando de Freitas,et al.  A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[59]  Ronald E. Parr,et al.  Solving Factored POMDPs with Linear Value Functions , 2001 .

[60]  Joelle Pineau,et al.  Planning under uncertainty in robotics , 2006, Robotics Auton. Syst..

[61]  Pedro U. Lima,et al.  ISROBOTNET: A testbed for sensor and robot network systems , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[62]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[63]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[64]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[65]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[66]  Jesse Hoey,et al.  A Decision-Theoretic Approach to Task Assistance for Persons with Dementia , 2005, IJCAI.