Linear Discriminant Analysis in Document Classification

Document representation using the bag-of-words approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the document-term matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters-21578 database.

[1]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[2]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[3]  Craig Boutilier,et al.  Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence , 2000 .

[4]  Ata Kabán,et al.  Fast Extraction of Semantic Features from a Latent Semantic Indexed Text Corpus , 2004, Neural Processing Letters.

[5]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[8]  L. K. Hansen,et al.  Independent Components in Text , 2000 .

[9]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[10]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[11]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[12]  David A. Hull Improving text retrieval for the routing problem using latent semantic indexing , 1994, SIGIR '94.

[13]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[14]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[15]  George Karypis,et al.  Concept Indexing: A Fast Dimensionality Reduction Algorithm With Applications to Document Retrieval and Categorization , 2000 .

[16]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[17]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[18]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[19]  G AndreouAndreas,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998 .

[20]  Guorong Xuan,et al.  Bhattacharyya distance feature selection , 1996, ICPR.

[21]  Inderjit S. Dhillon,et al.  Visualizing Class Structure of Multidimensional Data , 1998 .

[22]  Andrzej Cichocki,et al.  A common neural-network model for unsupervised exploratory data analysis and independent component analysis , 1998, IEEE Trans. Neural Networks.

[23]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[24]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[25]  William M. Campbell,et al.  Dimension Reduction Techniques for Training Polynomial Networks , 2000, ICML.

[26]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Shingo Tomita,et al.  An optimal orthonormal system for discriminant analysis , 1985, Pattern Recognit..

[29]  George Saon,et al.  Minimum Bayes Error Feature Selection for Continuous Speech Recognition , 2000, NIPS.

[30]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[31]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[32]  David A. Hull Using statistical testing in the evaluation of retrieval experiments , 1993, SIGIR.