Hierarchical Deep Reinforcement Learning For Robotics and Data Science

Author(s): Krishnan, Sanjay | Advisor(s): Goldberg, Kenneth | Abstract: This dissertation explores learning important structural features of a Markov DecisionProcess from offline data to significantly improve the sample-efficiency, stability, and robustnessof solutions even with high dimensional action spaces and long time horizons. Itpresents applications to surgical robot control, data cleaning, and generating efficient executionplans for relational queries. The dissertation contributes: (1) Sequential WindowedReinforcement Learning: a framework that approximates a long-horizon MDP with a sequenceof shorter term MDPs with smooth quadratic cost functions from a small numberof expert demonstrations, (2) Deep Discovery of Options: an algorithm that discovers hierarchicalstructure in the action space from observed demonstrations, (3) AlphaClean: asystem that decomposes a data cleaning task into a set of independent search problemsand uses deep q-learning to share structure across the problems, and (4) Learning QueryOptimizer: a system that observes executions of a dynamic program for SQL query optimizationand learns a model to predict cost-to-go values to greatly speed up future searchproblems.

[1]  Danica Kragic,et al.  Learning Actions from Observations , 2010, IEEE Robotics & Automation Magazine.

[2]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[3]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[4]  Gregory D. Hager,et al.  Transition State Clustering: Unsupervised Surgical Trajectory Segmentation for Robot Learning , 2017, ISRR.

[5]  Peter Kazanzides,et al.  An open-source research kit for the da Vinci® Surgical System , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Silvio Savarese,et al.  Watch-n-patch: Unsupervised understanding of actions and relations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Zoubin Ghahramani,et al.  Optimization with EM and Expectation-Conjugate-Gradient , 2003, ICML.

[8]  Byron Boots,et al.  Deeply AggreVaTeD: Differentiable Imitation Learning for Sequential Prediction , 2017, ICML.

[9]  Dan Feldman,et al.  Trajectory clustering for motion prediction , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  AUTOMATED DISCOVERY OF OPTIONS IN REINFORCEMENT LEARNING , 2003 .

[11]  Kenneth Y. Goldberg,et al.  Automating multi-throw multilateral surgical suturing with a mechanical needle guide and sequential convex optimization , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[12]  R. Veldkamp,et al.  Laparoscopic surgery versus open surgery for colon cancer : short-term outcomes of a randomised trial , 2022 .

[13]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[16]  Pieter Abbeel,et al.  Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion , 2007, NIPS.

[17]  Gábor Székely,et al.  Hybrid Cutting of Deformable Solids , 2006, IEEE Virtual Reality Conference (VR 2006).

[18]  John W. Fisher,et al.  Coresets for visual summarization with applications to loop closure , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Sanjay Krishnan,et al.  ActiveClean: Interactive Data Cleaning For Statistical Modeling , 2016, Proc. VLDB Endow..

[20]  Alec Solway,et al.  Optimal Behavioral Hierarchy , 2014, PLoS Comput. Biol..

[21]  Pravesh Ranchod,et al.  Nonparametric Bayesian reward segmentation for skill discovery using inverse reinforcement learning , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Sergey Levine,et al.  Unsupervised Perceptual Rewards for Imitation Learning , 2016, Robotics: Science and Systems.

[23]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[24]  G. McLachlan,et al.  Extensions of the EM Algorithm , 2007 .

[25]  Andrea Lockerd Thomaz,et al.  Exploration from Demonstration for Interactive Reinforcement Learning , 2016, AAMAS.

[26]  Jun Morimoto,et al.  Orientation in Cartesian space dynamic movement primitives , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Christian Laugier,et al.  Simulating soft tissue cutting using finite element models , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[28]  Matthieu Geist,et al.  Boosted Bellman Residual Minimization Handling Expert Demonstrations , 2014, ECML/PKDD.

[29]  Shie Mannor,et al.  Q-Cut - Dynamic Discovery of Sub-goals in Reinforcement Learning , 2002, ECML.

[30]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[31]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[32]  Sanjay Krishnan,et al.  Towards reliable interactive data cleaning: a user survey and recommendations , 2016, HILDA '16.

[33]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[34]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[35]  Han-Wen Nienhuys,et al.  A Surgery Simulation Supporting Cuts and Finite Element Deformation , 2001, MICCAI.

[36]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[37]  Paolo Papotti,et al.  BigDansing: A System for Big Data Cleansing , 2015, SIGMOD Conference.

[38]  Andrew G. Barto,et al.  Using relative novelty to identify useful temporal abstractions in reinforcement learning , 2004, ICML.

[39]  Pieter Abbeel,et al.  Superhuman performance of surgical tasks by robots using iterative learning from human-guided demonstrations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[40]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[41]  Michael Stonebraker,et al.  A Demonstration of DBWipes: Clean as You Query , 2012, Proc. VLDB Endow..

[42]  Christopher Ré,et al.  The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .

[43]  Sam Madden,et al.  Outlier Detection in Heterogeneous Datasets using Automatic Tuple Expansion , 2016 .

[44]  Samuel Madden,et al.  Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..

[45]  Wolfram Burgard,et al.  The limits and potentials of deep learning for robotics , 2018, Int. J. Robotics Res..

[46]  Ronald Fedkiw,et al.  Arbitrary cutting of deformable tetrahedralized objects , 2007, SCA '07.

[47]  Mamoru Mitsuishi,et al.  Online Trajectory Planning in Dynamic Environments for Surgical Task Automation , 2014, Robotics: Science and Systems.

[48]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[49]  Scott Kuindersma,et al.  Robot learning from demonstration by constructing skill trees , 2012, Int. J. Robotics Res..

[50]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[51]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[52]  Jeffrey M. Zacks,et al.  Prediction Error Associated with the Perceptual Segmentation of Naturalistic Events , 2011, Journal of Cognitive Neuroscience.

[53]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[54]  Pieter Abbeel,et al.  An Algorithmic Perspective on Imitation Learning , 2018, Found. Trends Robotics.

[55]  Saman P. Amarasinghe,et al.  A Common Runtime for High Performance Data Analysis , 2017, CIDR.

[56]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[57]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[58]  Balaraman Ravindran,et al.  Option Discovery in Hierarchical Reinforcement Learning using Spatio-Temporal Clustering , 2016, 1605.05359.

[59]  Scott Niekum,et al.  Learning and generalization of complex tasks from unstructured demonstrations , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[60]  Hui Zhang,et al.  On cutting and dissection of virtual deformable objects , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[61]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[62]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[63]  Nahum Shimkin,et al.  Unified Inter and Intra Options Learning Using Policy Gradient Methods , 2011, EWRL.

[64]  Ronald E. Parr,et al.  Hierarchical control and learning for markov decision processes , 1998 .

[65]  Ankush Gupta,et al.  A case study of trajectory transfer through non-rigid registration for a simplified suturing scenario , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[66]  Trevor Darrell,et al.  TSC-DL: Unsupervised trajectory segmentation of multi-modal surgical demonstrations with Deep Learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[67]  Brijen Thananjeyan,et al.  SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards , 2018, Int. J. Robotics Res..

[68]  A. Whiten,et al.  Imitation of hierarchical action structure by young children. , 2006, Developmental science.

[69]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[70]  Balaraman Ravindran,et al.  Learning to Repeat: Fine Grained Action Repetition for Deep Reinforcement Learning , 2017, ICLR.

[71]  Honglak Lee,et al.  Deep learning for detecting robotic grasps , 2013, Int. J. Robotics Res..

[72]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[73]  Pieter Abbeel,et al.  Learning by observation for surgical subtasks: Multilateral cutting of 3D viscoelastic and 2D Orthotropic Tissue Phantoms , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[74]  W. B. Roberts,et al.  Machine Learning: The High Interest Credit Card of Technical Debt , 2014 .

[75]  Andrew G. Barto,et al.  Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining , 2009, NIPS.

[76]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[77]  Andrew T. Irish,et al.  Trajectory Learning for Robot Programming by Demonstration Using Hidden Markov Model and Dynamic Time Warping , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[78]  Joseph Gonzalez,et al.  Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning , 2017, ArXiv.

[79]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[80]  Jonathan Lee,et al.  Iterative Noise Injection for Scalable Imitation Learning , 2017, ArXiv.