None of the Above: A Bayesian Account of the Detection of Novel Categories

Every time we encounter a new object, action, or event, there is some chance that we will need to assign it to a novel category. We describe and evaluate a class of probabilistic models that detect when an object belongs to a category that has not previously been encountered. The models incorporate a prior distribution that is influenced by the distribution of previous objects among categories, and we present 2 experiments that demonstrate that people are also sensitive to this distributional information. Two additional experiments confirm that distributional information is combined with similarity when both sources of information are available. We compare our approach to previous models of unsupervised categorization and to several heuristic-based models, and find that a hierarchical Bayesian approach provides the best account of our data.

[1]  T. Griffiths,et al.  Modeling individual differences using Dirichlet processes , 2006 .

[2]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[3]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[4]  Sandy L. Zabell,et al.  Carnap and the Logic of Inductive Inference , 2011, Inductive Logic.

[5]  Brian H. Ross,et al.  Food for Thought: Cross-Classification and Category Organization in a Complex Real-World Domain , 1999, Cognitive Psychology.

[6]  John R. Anderson,et al.  The Adaptive Nature of Human Categorization. , 1991 .

[7]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986 .

[8]  B. M. Hill,et al.  Posterior Distribution of Percentiles: Bayes' Theorem for Sampling From a Population , 1968 .

[9]  Daniel J. Navarro,et al.  Learning the context of a category , 2010, NIPS.

[10]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[11]  Lancelot F. James,et al.  Generalized weighted Chinese restaurant processes for species sampling mixture models , 2003 .

[12]  Adam N. Sanborn,et al.  Unifying rational models of categorization via the hierarchical Dirichlet process , 2019 .

[13]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[14]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[15]  Todd M. Bailey,et al.  Predicting category intuitiveness with the rational model, the simplicity model, and the generalized context model. , 2009, Journal of experimental psychology. Learning, memory, and cognition.

[16]  Max Welling Flexible Priors for Infinite Mixture Models , 2006 .

[17]  Alan Jern,et al.  Object discovery and identification , 2009, NIPS 2009.

[18]  R. Arratia,et al.  Logarithmic Combinatorial Structures: A Probabilistic Approach , 2003 .

[19]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[20]  Joshua B. Tenenbaum,et al.  A probabilistic model of cross-categorization , 2011, Cognition.

[21]  Daniel R. Little,et al.  Short-term memory scanning viewed as exemplar-based categorization. , 2011, Psychological review.

[22]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[23]  R. Nosofsky Tests of an exemplar model for relating perceptual classification and recognition memory. , 1991, Journal of experimental psychology. Human perception and performance.

[24]  Adam N Sanborn,et al.  Rational approximations to rational models: alternative algorithms for category learning. , 2010, Psychological review.

[25]  Kenneth J. Malmberg,et al.  Recognition memory: A review of the critical findings and an integrated theory for relating them , 2008, Cognitive Psychology.

[26]  Nick Chater,et al.  A simplicity principle in unsupervised human categorization , 2002, Cogn. Sci..

[27]  R. F. Thompson,et al.  Habituation: a model phenomenon for the study of neuronal substrates of behavior. , 1966, Psychological review.

[28]  Jeffrey W. Miller,et al.  Mixture Models With a Prior on the Number of Components , 2015, Journal of the American Statistical Association.

[29]  Yee Whye Teh,et al.  A stochastic memoizer for sequence data , 2009, ICML '09.

[30]  Feldman,et al.  The Structure of Perceptual Categories , 1997, Journal of mathematical psychology.

[31]  Cecil H. Brown The Folk Biology of the Tobelo People: A Study in Folk Classification , 1993 .

[32]  Amy Perfors,et al.  Bayesian Models of Cognition Revisited: Setting Optimality Aside and Letting Data Drive Psychological Theory , 2017, Psychological review.

[33]  Katherine A. Heller,et al.  An Alternative Prior Process for Nonparametric Bayesian Clustering , 2008, AISTATS.

[34]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[35]  A. Gopnik,et al.  Words, thoughts, and theories , 1997 .

[36]  S. Marsland Novelty Detection in Learning Systems , 2008 .

[37]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[38]  M. Oaksford,et al.  The rationality of informal argumentation: a Bayesian approach to reasoning fallacies. , 2007, Psychological review.

[39]  R. Shiffrin,et al.  A model for recognition memory: REM—retrieving effectively from memory , 1997, Psychonomic bulletin & review.

[40]  Yee Whye Teh,et al.  Rediscovery of Good–Turing estimators via Bayesian nonparametrics , 2014, Biometrics.

[41]  Daniel J. Navarro Finding hidden types: Inductive inference in long-tailed environments , 2013, CogSci.

[42]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[43]  Robert M Nosofsky,et al.  Category number impacts rule-based and information-integration category learning: a reassessment of evidence for dissociable category-learning systems. , 2013, Journal of experimental psychology. Learning, memory, and cognition.

[44]  R. Nosofsky Relation between the Rational Model and the Context Model of Categorization , 1991 .

[45]  Ramsés H. Mena,et al.  Bayesian non‐parametric inference for species variety with a two‐parameter Poisson–Dirichlet process prior , 2009 .

[46]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[47]  Thomas L. Griffiths,et al.  Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models , 2011, J. Mach. Learn. Res..

[48]  Xiaojin Zhu,et al.  Human Semi-Supervised Learning , 2013, Top. Cogn. Sci..

[49]  Bruce M. Hill,et al.  Posterior Moments of the Number of Species in a Finite Population and the Posterior Probability of Finding a New Species , 1979 .

[50]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[51]  M. Humphreys,et al.  A context noise model of episodic word recognition. , 2001, Psychological review.

[52]  D. Medin,et al.  SUSTAIN: a network model of category learning. , 2004, Psychological review.

[53]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[54]  Joseph L. Austerweil Testing the psychological validity of cluster construction biases , 2014, CogSci.

[55]  Nick Chater,et al.  A NON-PARAMETRIC APPROACH TO SIMPLICITY CLUSTERING , 2007, Appl. Artif. Intell..

[56]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[57]  J. Kingman Random Discrete Distributions , 1975 .

[58]  Nando de Freitas,et al.  Sequential Monte Carlo Methods in Practice , 2001, Statistics for Engineering and Information Science.

[59]  Amy Perfors,et al.  Learning time-varying categories , 2013, Memory & cognition.

[60]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[61]  Wai Keen Vong,et al.  The helpfulness of category labels in semi-supervised learning depends on category structure , 2016, Psychonomic bulletin & review.

[62]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[63]  J. Tenenbaum,et al.  Learning to learn categories , 2009 .

[64]  J. Tenenbaum,et al.  Bayesian Special Section Learning Overhypotheses with Hierarchical Bayesian Models , 2022 .

[65]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[66]  R. Nosofsky Similarity, frequency, and category representations. , 1988 .

[67]  J. Bunge,et al.  Estimating the Number of Species: A Review , 1993 .