论文信息 - A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits

A Natural Lottery Ticket Winner: Reinforcement Learning with Ordinary Neural Circuits

We propose a neural information processing system obtained by re-purposing the function of a biological neural circuit model to govern simulated and real-world control tasks. Inspired by the structure of the nervous system of the soilworm, C. elegans, we introduce ordinary neural circuits (ONCs), defined as the model of biological neural circuits reparameterized for the control of alternative tasks. We first demonstrate that ONCs realize networks with higher maximum flow compared to arbitrary wired networks. We then learn instances of ONCs to control a series of robotic tasks, including the autonomous parking of a real-world rover robot. For reconfiguration of the purpose of the neural circuit, we adopt a search-based optimization algorithm. Ordinary neural circuits perform on par and, in some cases, significantly surpass the performance of contemporary deep learning models. ONC networks are compact, 77% sparser than their counterpart neural controllers, and their neural dynamics are fully interpretable at the cell-level.

[1] S. R. Wicks,et al. A Dynamic Network Simulation of the Nematode Tap Withdrawal Circuit: Predictions Concerning Synaptic Function Using Behavioral Criteria , 1996, The Journal of Neuroscience.

[2] Arvind Satyanarayan,et al. The Building Blocks of Interpretability , 2018 .

[3] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[4] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[5] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[6] Manuel Zimmer,et al. Nested Neuronal Dynamics Orchestrate a Behavioral Hierarchy across Timescales , 2019, Neuron.

[7] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[8] Aravinthan D. T. Samuel,et al. Proprioceptive Coupling within Motor Neurons Drives C. elegans Forward Locomotion , 2012, Neuron.

[9] Lav R. Varshney,et al. Structural Properties of the Caenorhabditis elegans Neuronal Network , 2009, PLoS Comput. Biol..

[10] Yi Wang,et al. Whole-animal connectomes of both Caenorhabditis elegans sexes , 2019, Nature.

[11] Radu Grosu,et al. Designing Worm-inspired Neural Networks for Interpretable Robotic Control , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[12] Tim Hesterberg,et al. Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control , 2004, Technometrics.

[13] Radu Grosu,et al. Response Characterization for Auditing Cell Dynamics in Long Short-term Memory Networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[14] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[15] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[16] D. Chklovskii,et al. Wiring optimization can relate neuronal structure and function. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[17] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[18] Andreas Krause,et al. Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[19] Radu Grosu,et al. Probabilistic reachability analysis of the tap withdrawal circuit in caenorhabditis elegans , 2016, 2016 IEEE International High Level Design Validation and Test Workshop (HLDVT).

[20] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[21] Emma K. Towlson,et al. Synthetic ablations in the C. elegans nervous system , 2019, Network Neuroscience.

[22] Ramin M. Hasani. Interpretable Recurrent Neural Networks in Continuous-time Control Environments , 2020 .

[23] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[24] Radu Grosu,et al. c302: a multiscale framework for modelling the nervous system of Caenorhabditis elegans , 2018, Philosophical Transactions of the Royal Society B: Biological Sciences.

[25] Evan L Ardiel,et al. An elegant mind: learning and memory in Caenorhabditis elegans. , 2010, Learning & memory.

[26] S. R. Wicks,et al. Integration of mechanosensory stimuli in Caenorhabditis elegans , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[27] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[28] Kei Ito,et al. A Connectome of the Adult Drosophila Central Brain , 2020, bioRxiv.

[29] A. Pérez-Escudero,et al. Optimally wired subnetwork determines neuroanatomy of Caenorhabditis elegans , 2007, Proceedings of the National Academy of Sciences.

[30] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[31] Tao Xu,et al. Dissecting a central flip-flop circuit that integrates contradictory sensory cues in C. elegans feeding regulation , 2012, Nature Communications.

[32] S. Brenner,et al. The neural circuit for touch sensitivity in Caenorhabditis elegans , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[33] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[34] Christian Igel,et al. Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem , 2008, EWRL.

[35] S. Brenner,et al. The structure of the nervous system of the nematode Caenorhabditis elegans. , 1986, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[36] C. H. Rankin,et al. Caenorhabditis elegans: A new model system for the study of learning and memory , 1990, Behavioural Brain Research.

[37] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[38] Annika L A Nichols,et al. A global brain state underlies C. elegans sleep behavior , 2017, Science.

[39] Cori Bargmann. Chemosensation in C. elegans. , 2006, WormBook : the online review of C. elegans biology.

[40] Mathias Lechner,et al. Learning Long-Term Dependencies in Irregularly-Sampled Time Series , 2020, NeurIPS.

[41] Richard S. Sutton,et al. Reinforcement Learning with Replacing Eligibility Traces , 2005, Machine Learning.

[42] Yee Whye Teh,et al. Augmented Neural ODEs , 2019, NeurIPS.

[43] Fei-Fei Li,et al. Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[44] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[45] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[46] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[47] Radu Grosu,et al. Liquid Time-constant Recurrent Neural Networks as Universal Approximators , 2018, ArXiv.

[48] Benjamin Recht,et al. Simple random search provides a competitive approach to reinforcement learning , 2018, ArXiv.

[49] Richard Gordon,et al. OpenWorm: overview and recent advances in integrative biological simulation of Caenorhabditis elegans , 2018, Philosophical Transactions of the Royal Society B.

[50] Radu Grosu,et al. Gershgorin Loss Stabilizes the Recurrent Neural Network Compartment of an End-to-end Robot Learning Scheme , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[51] Vladimir Kolmogorov,et al. An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.