Kernel-Based Learning of Hierarchical Multilabel Classification Models

We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm's predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.

[1]  Gerald Salton,et al.  Automatic text processing , 1988 .

[2]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[3]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[4]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[5]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[6]  Robert J. McEliece,et al.  The generalized distributive law , 2000, IEEE Trans. Inf. Theory.

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[9]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[10]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[11]  John Shawe-Taylor,et al.  Syllables and other String Kernel Extensions , 2002, ICML.

[12]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[13]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[14]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[15]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[16]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[17]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[18]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[19]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[20]  Thomas Hofmann,et al.  Learning with Taxonomies: Classifying Documents and Words , 2003 .

[21]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[22]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[23]  Xiaojin Zhu,et al.  Kernel conditional random fields: representation and clique selection , 2004, ICML.

[24]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[25]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[26]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[27]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[28]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[29]  Juho Rousu,et al.  Efficient Computation of Gapped Substring Kernels on Large Alphabets , 2005, J. Mach. Learn. Res..

[30]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..