Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

This paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.

[1]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[2]  K. E. Kinnear,et al.  Evolving a sort: lessons in genetic programming , 1993, IEEE International Conference on Neural Networks.

[3]  Holger Voos,et al.  Controller design for quadrotor UAVs using reinforcement learning , 2010, 2010 IEEE International Conference on Control Applications.

[4]  Avinatan Hassidim,et al.  Sorting and Selection with Imprecise Comparisons , 2009, ICALP.

[5]  Richard Gisselquist Engineering in software , 1998, CACM.

[6]  Lydia Tapia,et al.  Continuous action reinforcement learning for control-affine systems with unknown dynamics , 2014, IEEE/CAA Journal of Automatica Sinica.

[7]  R.J. Williams,et al.  Reinforcement learning is direct adaptive optimal control , 1991, IEEE Control Systems.

[8]  Giuseppe F. Italiano,et al.  Sorting and Searching in Faulty Memories , 2008, Algorithmica.

[9]  David H. Ackley,et al.  Comparison Criticality in Sorting Algorithms , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[10]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[11]  D. Ernst,et al.  Approximate Value Iteration in the Reinforcement Learning Context. Application to Electrical Power System Control. , 2005 .

[12]  Saverio Caminiti,et al.  Local dependency dynamic programming in the presence of memory faults , 2011, STACS.

[13]  M. Hoagland,et al.  Feedback Systems An Introduction for Scientists and Engineers SECOND EDITION , 2015 .

[14]  F. Lewis,et al.  Reinforcement Learning and Feedback Control: Using Natural Decision Methods to Design Optimal Adaptive Controllers , 2012, IEEE Control Systems.

[15]  David H. Ackley Beyond efficiency , 2013, Commun. ACM.

[16]  Tony R. Martinez,et al.  AUTOMATIC ALGORITHM DEVELOPMENT USING NEW REINFORCEMENT PROGRAMMING TECHNIQUES , 2012, Comput. Intell..

[17]  Cristopher Moore,et al.  The Nature of Computation , 2011 .

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Lydia Tapia,et al.  Automated aerial suspended cargo delivery through reinforcement learning , 2017, Artif. Intell..