Learning Taxonomy Adaptation in Large-scale Classification

In this paper, we study flat and hierarchical classification strategies in the context of large-scale taxonomies. Addressing the problem from a learning-theoretic point of view, we first propose a multi-class, hierarchical data dependent bound on the generalization error of classifiers deployed in large-scale taxonomies. This bound provides an explanation to several empirical results reported in the literature, related to the performance of flat and hierarchical classifiers. Based on this bound, we also propose a technique for modifying a given taxonomy through pruning, that leads to a lower value of the upper bound as compared to the original taxonomy. We then present another method for hierarchy pruning by studying approximation error of a family of classifiers, and derive from it features used in a meta-classifier to decide which nodes to prune. We finally illustrate the theoretical developments through several experiments conducted on two widely used taxonomies.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Ofer Dekel,et al.  Distribution-Calibrated Hierarchical Classification , 2009, NIPS.

[3]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[4]  Ron Meir,et al.  Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..

[5]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[6]  Silvio Savarese,et al.  Find the Best Path: An Efficient and Accurate Classifier for Image Hierarchies , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[8]  Cordelia Schmid,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[10]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[11]  Peter Norvig,et al.  Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp , 1991 .

[12]  PartalasIoannis,et al.  Learning taxonomy adaptation in large-scale classification , 2016 .

[13]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  John Langford,et al.  Error-Correcting Tournaments , 2009, ALT.

[15]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[16]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[17]  Ioannis Partalas,et al.  Re-ranking approach to classification in large-scale power-law distributed category systems , 2014, SIGIR.

[18]  Kilian Q. Weinberger,et al.  Large Margin Taxonomy Embedding for Document Categorization , 2008, NIPS.

[19]  Xiaolin Wang,et al.  Flatten hierarchies for large-scale hierarchical text categorization , 2010, 2010 Fifth International Conference on Digital Information Management (ICDIM).

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  Ioannis Partalas,et al.  On Flat versus Hierarchical Classification in Large-Scale Taxonomies , 2013, NIPS.

[22]  Y. Guermeur Sample Complexity of Classifiers Taking Values in ℝ Q , Application to Multi-Class SVMs , 2010 .

[23]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[24]  Lei Tang,et al.  Automatically Adjusting Content Taxonomies for Hierarchical Classification , 2006 .

[25]  John Langford,et al.  Logarithmic Time Online Multiclass prediction , 2015, NIPS.

[26]  Matthias W. Seeger,et al.  PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..

[27]  Yiming Yang,et al.  Recursive regularization for large-scale classification with hierarchical and graphical dependencies , 2013, KDD.

[28]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[29]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[30]  John Langford,et al.  Conditional Probability Tree Estimation Analysis and Algorithms , 2009, UAI.

[31]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[32]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[33]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[34]  Georgios Paliouras,et al.  LSHTC: A Benchmark for Large-Scale Text Classification , 2015, ArXiv.

[35]  Harikrishna Narasimhan,et al.  Consistent Multiclass Algorithms for Complex Performance Measures , 2015, ICML.

[36]  Yiming Yang,et al.  Bayesian models for Large-scale Hierarchical Classification , 2012, NIPS.

[37]  Ioannis Partalas,et al.  On power law distributions in large-scale taxonomies , 2014, SKDD.

[38]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[39]  Aditya Kumar Mishra,et al.  Hierarchical Semi-supervised Classification with Incomplete Class Hierarchies , 2016, WSDM.

[40]  Ioannis Partalas,et al.  Maximum-Margin Framework for Training Data Synchronization in Large-Scale Hierarchical Classification , 2013, ICONIP.

[41]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[42]  Kilian Q. Weinberger,et al.  Large margin taxonomy embedding with an application to document categorization , 2008, NIPS 2008.

[43]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[44]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[45]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[46]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[47]  Hassan H. Malik Improving Hierarchical SVMs by Hierarchy Flattening and Lazy Classification , 2010 .

[48]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[49]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.