Distinguishing rule- and exemplar-based generalization in learning systems

Despite the increasing scale of datasets in machine learning, generalization to unseen regions of the data distribution remains crucial. Such extrapolation is by definition underdetermined and is dictated by a learner’s inductive biases. Machine learning systems often do not share the same inductive biases as humans and, as a result, extrapolate in ways that are inconsistent with our expectations. We investigate two distinct such inductive biases: feature-level bias (differences in which features are more readily learned) and exemplar-vs-rule bias (differences in how these learned features are used for generalization). Exemplarvs. rule-based generalization has been studied extensively in cognitive psychology, and, in this work, we present a protocol inspired by these experimental approaches for directly probing this trade-off in learning systems. The measures we propose characterize changes in extrapolation behavior when feature coverage is manipulated in a combinatorial setting. We present empirical results across a range of models and across both expository and real-world image and language domains. We demonstrate that measuring the exemplar-rule trade-off while controlling for feature-level bias provides a more complete picture of extrapolation behavior than existing formalisms. We find that most standard neural network models have a propensity towards exemplarbased extrapolation and discuss the implications of these findings for research on data augmentation, fairness, and systematic generalization.

[1]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[2]  Surya Ganguli,et al.  On the Expressive Power of Deep Neural Networks , 2016, ICML.

[3]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[4]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[5]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[6]  L. Brooks,et al.  Specializing the operation of an explicit rule , 1991 .

[7]  L. Rips Similarity, typicality, and categorization , 1989 .

[8]  Percy Liang,et al.  An Investigation of Why Overparameterization Exacerbates Spurious Correlations , 2020, ICML.

[9]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[10]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[11]  Samuel J. Gershman,et al.  Analyzing machine-learned representations: A natural language case study , 2019, Cogn. Sci..

[12]  Barnabás Póczos,et al.  Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.

[13]  L. Rips,et al.  Categories and resemblance. , 1993, Journal of experimental psychology. General.

[14]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[15]  Amos J. Storkey,et al.  Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[16]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[17]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[18]  Vitaly Feldman,et al.  What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation , 2020, NeurIPS.

[19]  Alexander D'Amour,et al.  Underspecification Presents Challenges for Credibility in Modern Machine Learning , 2020, J. Mach. Learn. Res..

[20]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[21]  Vitaly Feldman,et al.  When is memorization of irrelevant training data necessary for high-accuracy learning? , 2020, STOC.

[22]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[23]  Christoph H. Lampert,et al.  Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Marco Baroni,et al.  Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[25]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[26]  Brendan T. O'Connor,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics , 2011 .

[27]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[28]  Linda B. Smith,et al.  The importance of shape in early lexical learning , 1988 .

[29]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[30]  John Duchi,et al.  Understanding and Mitigating the Tradeoff Between Robustness and Accuracy , 2020, ICML.

[31]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[32]  Samuel Ritter,et al.  Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study , 2017, ICML.

[33]  俊一 甘利 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .

[34]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[35]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[36]  Simon Kornblith,et al.  The Origins and Prevalence of Texture Bias in Convolutional Neural Networks , 2019, NeurIPS.

[37]  Carla L. Hudson Kam,et al.  Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change , 2005 .

[38]  R. Shepard,et al.  Stimulus generalization in the learning of classifications. , 1963, Journal of experimental psychology.

[39]  E. Pothos The rules versus similarity distinction. , 2005, The Behavioral and brain sciences.

[40]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[41]  Edward E. Smith,et al.  Similarity- versus rule-based categorization , 1994, Memory & cognition.

[42]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Jacob Andreas,et al.  Measuring Compositionality in Representation Learning , 2019, ICLR.

[44]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[45]  Yuanzhi Li,et al.  Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.

[46]  Gary Marcus,et al.  Deep Learning: A Critical Appraisal , 2018, ArXiv.

[47]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[48]  R. Nosofsky,et al.  Rules and exemplars in categorization, identification, and recognition. , 1989, Journal of experimental psychology. Learning, memory, and cognition.

[49]  N. Perrin,et al.  Varieties of perceptual independence. , 1986, Psychological review.