Towards Optimal Discriminating Order for Multiclass Classification

In this paper, we investigate how to design an optimized discriminating order for boosting multiclass classification. The main idea is to optimize a binary tree architecture, referred to as Sequential Discriminating Tree (SDT), that performs the multiclass classification through a hierarchical sequence of coarse-to-fine binary classifiers. To infer such a tree architecture, we employ the constrained large margin clustering procedure which enforces samples belonging to the same class to locate at the same side of the hyper plane while maximizing the margin between these two partitioned class subsets. The proposed SDT algorithm has a theoretic error bound which is shown experimentally to effectively guarantee the generalization performance. Experiment results indicate that SDT clearly beats the state-of-the-art multiclass classification algorithms.

[1]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[2]  Shuicheng Yan,et al.  Weakly-supervised hashing in kernel space , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Sergio Escalera,et al.  Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Jennifer G. Dy,et al.  A hierarchical method for multi-class support vector machines , 2004, ICML.

[13]  Stefan Kramer,et al.  Ensembles of nested dichotomies for multi-class problems , 2004, ICML.

[14]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[15]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[16]  Nenghai Yu,et al.  Maximum Margin Clustering with Pairwise Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[18]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[19]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[20]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[22]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[23]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[24]  Gérard Dreyfus,et al.  Single-layer learning revisited: a stepwise procedure for building and training a neural network , 1989, NATO Neurocomputing.

[25]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[26]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[28]  Chi-Jen Lu,et al.  Tree Decomposition for Large-Scale SVM Problems , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.

[29]  Nello Cristianini,et al.  Enlarging the Margins in Perceptron Decision Trees , 2000, Machine Learning.

[30]  Stefan Kramer,et al.  Ensembles of Balanced Nested Dichotomies for Multi-class Problems , 2005, PKDD.

[31]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[32]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[33]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[34]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[35]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[36]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .