A Bayesian generative model for learning semantic hierarchies

Building fine-grained visual recognition systems that are capable of recognizing tens of thousands of categories, has received much attention in recent years. The well known semantic hierarchical structure of categories and concepts, has been shown to provide a key prior which allows for optimal predictions. The hierarchical organization of various domains and concepts has been subject to extensive research, and led to the development of the WordNet domains hierarchy (Fellbaum, 1998), which was also used to organize the images in the ImageNet (Deng et al., 2009) dataset, in which the category count approaches the human capacity. Still, for the human visual system, the form of the hierarchy must be discovered with minimal use of supervision or innate knowledge. In this work, we propose a new Bayesian generative model for learning such domain hierarchies, based on semantic input. Our model is motivated by the super-subordinate organization of domain labels and concepts that characterizes WordNet, and accounts for several important challenges: maintaining context information when progressing deeper into the hierarchy, learning a coherent semantic concept for each node, and modeling uncertainty in the perception process.

[1]  Alexander C. Berg,et al.  Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.

[2]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[3]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[4]  Silvio Savarese,et al.  Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[6]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[7]  Dileep George,et al.  Towards a Mathematical Theory of Cortical Micro-circuits , 2009, PLoS Comput. Biol..

[8]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Honglak Lee,et al.  Unsupervised learning of hierarchical representations with convolutional deep belief networks , 2011, Commun. ACM.

[10]  J. Tenenbaum,et al.  Special issue on “Probabilistic models of cognition , 2022 .

[11]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  J. Tenenbaum,et al.  Optimal Predictions in Everyday Cognition , 2006, Psychological science.

[13]  Tai Sing Lee,et al.  Hierarchical Bayesian inference in the visual cortex. , 2003, Journal of the Optical Society of America. A, Optics, image science, and vision.

[14]  Arijit Biswas,et al.  Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Joshua B. Tenenbaum,et al.  Learning to Learn with Compound HD Models , 2011, NIPS.

[16]  Michael I. Jordan,et al.  Tree-Structured Stick Breaking for Hierarchical Data , 2010, NIPS.

[17]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[18]  J. Tenenbaum,et al.  Opinion TRENDS in Cognitive Sciences Vol.10 No.7 July 2006 Special Issue: Probabilistic models of cognition Theory-based Bayesian models of inductive learning and reasoning , 2022 .

[19]  Roland Racko Hedging your bets , 1995 .

[20]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[21]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[22]  J. Tenenbaum,et al.  Probabilistic models of cognition: where next? , 2006, Trends in Cognitive Sciences.

[23]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[24]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[26]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[27]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[28]  S. Edelman,et al.  Toward direct visualization of the internal shape representation space by fMRI , 1998, Psychobiology.

[29]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[30]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[31]  Emanuele Pianta,et al.  Revising the Wordnet Domains Hierarchy: semantics, coverage and balancing , 2004 .

[32]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Andrew Zisserman,et al.  Learning Visual Attributes , 2007, NIPS.

[37]  Silvio Savarese,et al.  Hierarchical classification of images by sparse approximation , 2013, Image Vis. Comput..

[38]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[39]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[40]  M. Heller Hedging your bets , 1991 .

[41]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[42]  Carlo Strapparava,et al.  The role of domain information in Word Sense Disambiguation , 2002, Natural Language Engineering.

[43]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[44]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[45]  Pietro Perona,et al.  Unsupervised Organization of Image Collections: Taxonomies and Beyond , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Adriana Kovashka,et al.  Actively selecting annotations among objects and attributes , 2011, 2011 International Conference on Computer Vision.

[47]  P. Berkes,et al.  Statistically Optimal Perception and Learning: from Behavior to Neural Representations , 2022 .

[48]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[49]  R. Goebel,et al.  Human Object-Similarity Judgments Reflect and Transcend the Primate-IT Object Representation , 2013, Front. Psychol..

[50]  T. Griffiths,et al.  Probabilistic inference in human semantic memory , 2006, Trends in Cognitive Sciences.

[51]  Motoaki Kawanabe,et al.  On Taxonomies for Multi-class Image Categorization , 2012, International Journal of Computer Vision.

[52]  Jonathan Krause,et al.  Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[54]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[55]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.