Taxonomic Prediction with Tree-Structured Covariances

Taxonomies have been proposed numerous times in the literature in order to encode semantic relationships between classes. Such taxonomies have been used to improve classification results by increasing the statistical efficiency of learning, as similarities between classes can be used to increase the amount of relevant data during training. In this paper, we show how data-derived taxonomies may be used in a structured prediction framework, and compare the performance of learned and semantically constructed taxonomies. Structured prediction in this case is multi-class categorization with the assumption that categories are taxonomically related. We make three main contributions: (i) We prove the equivalence between tree-structured covariance matrices and taxonomies; (ii) We use this covariance representation to develop a highly computationally efficient optimization algorithm for structured prediction with taxonomies; (iii) We show that the taxonomies learned from data using the Hilbert- Schmidt Independence Criterion (HSIC) often perform better than imputed semantic taxonomies. Source code of this implementation, as well as machine readable learned taxonomies are available for download from https://github.com/blaschko/tree-structured-covariance.

[1]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[2]  P. Buneman The Recovery of Trees from Measures of Dissimilarity , 1971 .

[3]  J. Magnus,et al.  Matrix Differential Calculus with Applications in Statistics and Econometrics , 1991 .

[4]  Ke Wang,et al.  Building Hierarchical Classifiers Using Class Proximity , 1999, VLDB.

[5]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[6]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[7]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[8]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[9]  Xiaodong Fan Efficient multiclass object detection by a hierarchy of classifiers , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[11]  Daphna Weinshall,et al.  Exploiting Object Hierarchy: Combining Models from Different Category Levels , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[14]  Le Song,et al.  A dependence maximization view of clustering , 2007, ICML '07.

[15]  Robert Tibshirani,et al.  Margin Trees for High-dimensional Classification , 2007, J. Mach. Learn. Res..

[16]  Andrew J. Davison,et al.  Active Matching , 2008, ECCV.

[17]  Christoph H. Lampert,et al.  A Multiple Kernel Learning Approach to Joint Multi-class Object Detection , 2008, DAGM-Symposium.

[18]  Arthur Gretton,et al.  Taxonomy Inference Using Kernel Dependence Measures , 2008 .

[19]  Pietro Perona,et al.  Learning and using taxonomies for fast visual categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Arthur Gretton,et al.  Learning Taxonomies by Dependence Maximization , 2008, NIPS.

[21]  Kilian Q. Weinberger,et al.  Large Margin Taxonomy Embedding for Document Categorization , 2008, NIPS.

[22]  Kilian Q. Weinberger,et al.  Large margin taxonomy embedding with an application to document categorization , 2008, NIPS 2008.

[23]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[24]  Stephen J. Wright,et al.  Estimating Tree-Structured Covariance Matrices via Mixed-Integer Programming , 2009, AISTATS.

[25]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[26]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[28]  Andrew Zisserman,et al.  Delving deeper into the whorl of flower segmentation , 2010, Image Vis. Comput..

[29]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[30]  H. Volger WIPO – World Intellectual Property Organization , 2010 .

[31]  Motoaki Kawanabe,et al.  On Taxonomies for Multi-class Image Categorization , 2012, International Journal of Computer Vision.

[32]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[33]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[34]  Tibério S. Caetano,et al.  Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies , 2013, International Journal of Computer Vision.

[35]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[36]  Matthew B. Blaschko,et al.  Taxonomic Multi-class Prediction and Person Layout Using Efficient Structured Ranking , 2012, ECCV.