Efficient multi-way text categorization via generalized discriminant analysis

Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using Generalized Singular Value Decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.

[1]  C. Loan Generalizing the Singular Value Decomposition , 1976 .

[2]  Sunita Sarawagi,et al.  Scaling multi-class support vector machines using inter-class confusion , 2002, KDD.

[3]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[6]  Spiridon D. Likothanassis,et al.  Integrating feature and instance selection for text classification , 2002, KDD.

[7]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[8]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[9]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[12]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[13]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[14]  Vipin Kumar,et al.  WebACE: a Web agent for document categorization and exploration , 1998, AGENTS '98.

[15]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[16]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[17]  James Demmel,et al.  Jacobi's Method is More Accurate than QR , 1989, SIAM J. Matrix Anal. Appl..

[18]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[19]  Kostas Tzeras,et al.  Automatic indexing based on Bayesian inference networks , 1993, SIGIR.

[20]  Rayid Ghani,et al.  Using Error-Correcting Codes for Text Classification , 2000, ICML.

[21]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[22]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[23]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[24]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[25]  Thorsten Joachims,et al.  A statistical learning learning model of text classification for support vector machines , 2001, SIGIR '01.

[26]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[27]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[28]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[29]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[30]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[31]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[32]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.