Using discriminant analysis for multi-class classification: an experimental investigation

Many supervised machine learning tasks can be cast as multi-class classification problems. Support vector machines (SVMs) excel at binary classification problems, but the elegant theory behind large-margin hyperplane cannot be easily extended to their multi-class counterparts. On the other hand, it was shown that the decision hyperplanes for binary classification obtained by SVMs are equivalent to the solutions obtained by Fisher's linear discriminant on the set of support vectors. Discriminant analysis approaches are well known to learn discriminative feature transformations in the statistical pattern recognition literature and can be easily extend to multi-class cases. The use of discriminant analysis, however, has not been fully experimented in the data mining literature. In this paper, we explore the use of discriminant analysis for multi-class classification problems. We evaluate the performance of discriminant analysis on a large collection of benchmark datasets and investigate its usage in text categorization. Our experiments suggest that discriminant analysis provides a fast, efficient yet accurate alternative for general multi-class classification problems.

[1]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[2]  Vipin Kumar,et al.  Document Categorization and Query Generation on the World Wide Web Using WebACE , 1999, Artificial Intelligence Review.

[3]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  B. Zadrozny Reducing multiclass to binary by coupling probability estimates , 2001, NIPS.

[6]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[7]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[10]  Volker Roth,et al.  Probabilistic Discriminative Kernel Classifiers for Multi-class Problems , 2001, DAGM-Symposium.

[11]  C. Loan Generalizing the Singular Value Decomposition , 1976 .

[12]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[13]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[14]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[15]  A. Shashua On the Equivalence between the Support Vector Machine for Classiication and Sparsiied Fisher's Linear Discriminant , 1999 .

[16]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[17]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[18]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[19]  Z. Bai The CSD, GSVD, their Applications and Computations (cid:3) , 1992 .

[20]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[21]  P. GALLINARI,et al.  On the relations between discriminant analysis and multilayer perceptrons , 1991, Neural Networks.

[22]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[23]  Xiaoou Tang,et al.  Dual-space linear discriminant analysis for face recognition , 2004, CVPR 2004.

[24]  Robert P. W. Duin,et al.  Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[26]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[27]  Amnon Shashua,et al.  On the Relationship Between the Support Vector Machine for Classification and Sparsified Fisher's Linear Discriminant , 1999, Neural Processing Letters.

[28]  Takahiko Kawatani Topic difference factor extraction between two document sets and its application to text categorization , 2002, SIGIR '02.

[29]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[30]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[31]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[32]  R. Chellappa,et al.  Subspace Linear Discriminant Analysis for Face Recognition , 1999 .

[33]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[34]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[35]  Rayid Ghani,et al.  Combining labeled and unlabeled data for text classification with a large number of categories , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[36]  David J. C. MacKay,et al.  Variational Gaussian process classifiers , 2000, IEEE Trans. Neural Networks Learn. Syst..

[37]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[38]  David Barber,et al.  Gaussian Processes for Bayesian Classification via Hybrid Monte Carlo , 1996, NIPS.

[39]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[40]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[41]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[42]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[43]  Shenghuo Zhu,et al.  Using discriminant analysis for multi-class classification , 2003, Third IEEE International Conference on Data Mining.

[44]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[45]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[46]  Jude W. Shavlik,et al.  Training Knowledge-Based Neural Networks to Recognize Genes , 1990, NIPS.

[47]  Narendra Ahuja,et al.  Learning to recognize objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[48]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[49]  Jason D. M. Rennie Improving multi-class text classification with Naive Bayes , 2001 .

[50]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Venkatesan Guruswami,et al.  Multiclass learning, boosting, and error-correcting codes , 1999, COLT '99.

[52]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[53]  Bernard Zenko,et al.  Stacking with Multi-response Model Trees , 2002, Multiple Classifier Systems.

[54]  Vipin Kumar,et al.  WebACE: a Web agent for document categorization and exploration , 1998, AGENTS '98.

[55]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[56]  Cheng-Hsing Yang,et al.  Efficient routability check algorithms for segmented channel routing , 2000, TODE.

[57]  David G. Stork,et al.  Pattern Classification , 1973 .

[58]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[60]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[61]  Haesun Park,et al.  Dimension Reduction for Text Data Representation Based on Cluster Structure Preserving Projection , 2001 .

[62]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.