Beneficial Perturbation Network for Designing General Adaptive Artificial Intelligence Systems

The human brain is the gold standard of adaptive learning. It not only can learn and benefit from experience, but also can adapt to new situations. In contrast, deep neural networks only learn one sophisticated but fixed mapping from inputs to outputs. This limits their applicability to more dynamic situations, where the input to output mapping may change with different contexts. A salient example is continual learning-learning new independent tasks sequentially without forgetting previous tasks. Continual learning of multiple tasks in artificial neural networks using gradient descent leads to catastrophic forgetting, whereby a previously learned mapping of an old task is erased when learning new mappings for new tasks. Herein, we propose a new biologically plausible type of deep neural network with extra, out-of-network, task-dependent biasing units to accommodate these dynamic situations. This allows, for the first time, a single network to learn potentially unlimited parallel input to output mappings, and to switch on the fly between them at runtime. Biasing units are programed by leveraging beneficial perturbations (opposite to well-known adversarial perturbations) for each task. Beneficial perturbations for a given task bias the network toward that task, essentially switching the network into a different mode to process that task. This largely eliminates catastrophic interference between tasks. Our approach is memory-efficient and parameter-efficient, can accommodate many tasks, and achieves the state-of-the-art performance across different tasks and domains.

[1]  Christopher Summerfield,et al.  Comparing continual task learning in minds and machines , 2018, Proceedings of the National Academy of Sciences.

[2]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[3]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[4]  Bruno A. Olshausen,et al.  Superposition of many models into one , 2019, NeurIPS.

[5]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[6]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[7]  Eunho Yang,et al.  ORACLE: Order Robust Adaptive Continual LEarning , 2019, ArXiv.

[8]  Svetlana Lazebnik,et al.  Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights , 2018, ECCV.

[9]  C. Stark,et al.  Pattern Separation in the Human Hippocampal CA3 and Dentate Gyrus , 2008, Science.

[10]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[11]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[12]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[13]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[14]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jefferson E. Roy,et al.  Prefrontal Cortex Activity during Flexible Categorization , 2010, The Journal of Neuroscience.

[16]  Nicolas Y. Masse,et al.  Alleviating catastrophic forgetting using context-dependent gating and synaptic stabilization , 2018, Proceedings of the National Academy of Sciences.

[17]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[19]  Dan Boneh,et al.  The Space of Transferable Adversarial Examples , 2017, ArXiv.

[20]  Byoung-Tak Zhang,et al.  Overcoming Catastrophic Forgetting by Incremental Moment Matching , 2017, NIPS.

[21]  Mehrdad Farajtabar,et al.  Orthogonal Gradient Descent for Continual Learning , 2019, AISTATS.

[22]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[23]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[24]  Alice F. Healy,et al.  From learning theory to connectionist theory , 1992 .

[25]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[26]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[27]  Edith Lesburguères,et al.  Early Tagging of Cortical Networks Is Required for the Formation of Enduring Associative Memory , 2011, Science.

[28]  Xiao-Jing Wang,et al.  Task representations in neural networks trained to perform many cognitive tasks , 2019, Nature Neuroscience.

[29]  Yu Cao,et al.  Single-Net Continual Learning with Progressive Segmented Training , 2019, 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA).

[30]  Shan Yu,et al.  Continual learning of context-dependent processing in neural networks , 2018, Nature Machine Intelligence.

[31]  Jascha Sohl-Dickstein,et al.  Adversarial Reprogramming of Neural Networks , 2018, ICLR.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  P. Frankland,et al.  The organization of recent and remote memories , 2005, Nature Reviews Neuroscience.

[34]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[36]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[37]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[38]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[39]  L. Squire,et al.  Retrograde amnesia and memory consolidation: a neurobiological perspective , 1995, Current Opinion in Neurobiology.

[40]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[41]  R. French Dynamically constraining connectionist networks to produce distributed, orthogonal representations to reduce catastrophic interference , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[42]  Marcus Rohrbach,et al.  Selfless Sequential Learning , 2018, ICLR.

[43]  Laurent Itti,et al.  Closed-Loop GAN for continual Learning , 2018, IJCAI.

[44]  Jefferson E. Roy,et al.  Representation of Multiple, Independent Categories in the Primate Prefrontal Cortex , 2010, Neuron.

[45]  Jun Yan Adversarial Examples and Adversarial Training , 2019 .

[46]  René Vidal,et al.  Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[48]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[50]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[51]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2019, Neural Networks.

[52]  Robert T Knight,et al.  Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans , 2019, Nature Communications.