Deep Value of Information Estimators for Collaborative Human-Machine Information Gathering

Effective human-machine collaboration can significantly improve many learning and planning strategies for information gathering via fusion of 'hard' and 'soft' data originating from machine and human sensors, respectively. However, gathering the most informative data from human sensors without task overloading remains a critical technical challenge. In this context, Value of Information (VOI) is a crucial decision- theoretic metric for scheduling interaction with human sensors. We present a new Deep Learning based VOI estimation framework that can be used to schedule collaborative human-machine sensing with efficient online inference and minimal policy hand-tuning. Supervised learning is used to train deep convolutional neural networks (CNNs) to extract hierarchical features from 'images' of belief spaces obtained via data fusion. These features can be associated with soft data query choices to reliably compute VOI for human interaction. The CNN framework is described in detail, and a performance comparison to a feature- based POMDP scheduling policy is provided. The practical feasibility of our method is also demonstrated on a mobile robotic search problem with language-based semantic human sensor inputs.

[1]  Vikram Krishnamurthy,et al.  Structured Threshold Policies for Dynamic Sensor Scheduling—A Partially Observed Markov Decision Process Approach , 2007, IEEE Transactions on Signal Processing.

[2]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[3]  Mark E. Campbell,et al.  Fully bayesian learning and spatial reasoning with flexible human sensor networks , 2015, ICCPS.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Alexei Makarenko,et al.  Human-robot communication for collaborative decision making - A probabilistic approach , 2010, Robotics Auton. Syst..

[6]  Qiang Liu,et al.  Belief Propagation for Structured Decision Making , 2012, UAI.

[7]  Sergey Levine,et al.  Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Alexei Makarenko,et al.  Shared environment representation for a human‐robot team performing information fusion , 2007, J. Field Robotics.

[11]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[12]  Mark E. Campbell,et al.  Scalable Bayesian human-robot cooperation in mobile sensor networks , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Andreas Krause,et al.  Near-optimal Nonmyopic Value of Information in Graphical Models , 2005, UAI.

[14]  David Nicholson,et al.  Crowdsourcing soft data for improved urban situation assessment , 2013, Proceedings of the 16th International Conference on Information Fusion.

[15]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[16]  Maria-Florina Balcan,et al.  Robust Interactive Learning , 2012, COLT.

[17]  Hugh F. Durrant-Whyte,et al.  On entropy approximation for Gaussian mixture random vectors , 2008, 2008 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems.

[18]  Nisar R. Ahmed,et al.  Bayesian Multicategorical Soft Data Fusion for Human–Robot Collaboration , 2013, IEEE Transactions on Robotics.