Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents

Critical periods are phases during which a toddler’s brain develops in spurts. To promote children’s cognitive development, proper guidance is critical in this stage. However, it is not clear whether such a critical period also exists for the training of AI agents. Similar to human toddlers, well-timed guidance and multimodal interactions might significantly enhance the training efficiency of AI agents as well. To validate this hypothesis, we adapt this notion of critical periods to learning in AI agents and investigate the critical period in the virtual environment for AI agents. We formalize the critical period and Toddler-guidance learning in the reinforcement learning (RL) framework. Then, we built up a toddler-like environment with VECA toolkit to mimic human toddlers’ learning characteristics. We study three discrete levels of mutual interaction: weak-mentor guidance (sparse reward), moderate mentor guidance (helper-reward), and mentor demonstration (behavioral cloning). We also introduce the EAVE dataset consisting of 30,000 real-world images to fully reflect the toddler’s viewpoint. We evaluate the impact of critical periods on AI agents from two perspectives: how and when they are guided best in both uni- and multimodal learning. Our experimental results show that both uni- and multimodal agents with moderate mentor guidance and critical period on 1 million and 2 million training steps show a noticeable improvement. We validate these results with transfer learning on the EAVE dataset and find the performance advancement on the same critical period and the guidance.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Stephen Krashen,et al.  LATERALIZATION, LANGUAGE LEARNING, AND THE CRITICAL PERIOD: SOME NEW EVIDENCE , 1973 .

[3]  T. Hensch Critical period regulation. , 2004, Annual review of neuroscience.

[4]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[5]  Shih-Chii Liu,et al.  A curriculum learning method for improved noise robustness in automatic speech recognition , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[6]  Kristen Grauman,et al.  SoundSpaces: Audio-Visual Navigation in 3D Environments , 2020, ECCV.

[7]  S. Billett Guided learning at work , 2000, Learning in the workplace.

[8]  J. Darley,et al.  Expectancy confirmation processes arising in the social interaction sequence. , 1980 .

[9]  Nicu Sebe,et al.  Curriculum Learning: A Survey , 2021 .

[10]  Tyler Lu,et al.  Fundamental Limitations of Semi-Supervised Learning , 2009 .

[11]  E. Knudsen,et al.  Sensitive Periods for Visual Calibration of the Auditory Space Map in the Barn Owl Optic Tectum , 1998, The Journal of Neuroscience.

[12]  B. Chiswick,et al.  A Test of the Critical Period Hypothesis for Language Learning1 , 2008 .

[13]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[15]  Roger C. Schank,et al.  Conceptual dependency: A theory of natural language understanding , 1972 .

[16]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  A. Kral Auditory critical periods: A review from system’s perspective , 2013, Neuroscience.

[18]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  E. Gibson Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge. , 1988 .

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Stefano Soatto,et al.  Critical Learning Periods in Deep Networks , 2018, ICLR.

[22]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[23]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[26]  Lei Zhang,et al.  Active Self-Paced Learning for Cost-Effective and Progressive Face Identification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  David J. Crandall,et al.  A Computational Model of Early Word Learning from the Infant's Point of View , 2020, CogSci.

[28]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[29]  Nicolas Guizard,et al.  CASED: Curriculum Adaptive Sampling for Extreme Data Imbalance , 2017, MICCAI.

[30]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[31]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[32]  Kwanyoung Park,et al.  Learning task-agnostic representation via toddler-inspired learning , 2021 .

[33]  Ruslan Salakhutdinov,et al.  Weakly-Supervised Reinforcement Learning for Controllable Behavior , 2020, NeurIPS.

[34]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  D. Singleton,et al.  The age factor in second language acquisition : a critical look at the critical period hypothesis , 1995 .

[36]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[37]  Yangyang Shi,et al.  Recurrent neural network language model adaptation with curriculum learning , 2015, Comput. Speech Lang..

[38]  J. Bates,et al.  Early antecedents of childhood impulsivity: The role of parent-child interaction, cognitive competence, and temperament , 1990, Journal of abnormal child psychology.

[39]  Catherine E. Snow,et al.  The Critical Period for Language Acquisition : Evidence from Second Language Learning , 2007 .

[40]  Anil A. Bharath,et al.  Deep Reinforcement Learning: A Brief Survey , 2017, IEEE Signal Processing Magazine.

[41]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ali Farhadi,et al.  AI2-THOR: An Interactive 3D Environment for Visual AI , 2017, ArXiv.

[43]  E. Reese,et al.  Children Learning to Read Later Catch up to Children Reading Earlier. , 2013 .

[44]  Yueting Zhuang,et al.  Self-Paced Boost Learning for Classification , 2016, IJCAI.

[45]  Alekh Agarwal,et al.  Safe Reinforcement Learning via Curriculum Induction , 2020, NeurIPS.

[46]  Finale Doshi-Velez,et al.  The Infinite Partially Observable Markov Decision Process , 2009, NIPS.

[47]  Marius Leordeanu,et al.  Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN) , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[48]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[49]  Linda B. Smith,et al.  Toddler-Inspired Visual Object Learning , 2018, NeurIPS.

[50]  Valentin I. Spitkovsky,et al.  Baby Steps: How “Less is More” in Unsupervised Dependency Parsing , 2009 .

[51]  G. C. Homans,et al.  Social Behavior: Its Elementary Forms. , 1975 .

[52]  P. L. Adams THE ORIGINS OF INTELLIGENCE IN CHILDREN , 1976 .

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.