Hierarchical Classification via Orthogonal Transfer

We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that encourages the classifier at each node to be different from the classifiers at its ancestors. More specifically, we introduce regularizations that force the normal vector of the classifying hyperplane at each node to be orthogonal to those at its ancestors as much as possible. We establish conditions under which training such a hierarchical SVM is a convex optimization problem, and develop an efficient dual-averaging method for solving it.

[1]  Alberto Seeger,et al.  A Variational Approach to Copositive Matrices , 2010, SIAM Rev..

[2]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[3]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[4]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[5]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[6]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[7]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[8]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[9]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[10]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[11]  Kilian Q. Weinberger,et al.  Large margin taxonomy embedding with an application to document categorization , 2008, NIPS 2008.

[12]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[15]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Yurii Nesterov,et al.  Lexicographic differentiation of nonsmooth functions , 2005, Math. Program..

[17]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[18]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[19]  Andreas S. Weigend,et al.  Exploiting Hierarchy in Text Categorization , 1999, Information Retrieval.

[20]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[21]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[22]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[25]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.