Learning Locality Discriminating Indexing for Text Categorization

This paper introduces a locality discriminating indexing (LDI) algorithm for text categorization. The LDI algorithm offers a manifold way of discriminant analysis. Based on the hypothesis that samples from different classes reside in class-specific manifold structures, the algorithm depicts the manifold structures by a nearest-native graph and a invader graphs. And a new locality discriminant criterion is proposed, which best preserves the within-class local structures while suppresses the between-class overlap. Using the notion of the Laplacian of the graphs, the LDI algorithm finds the optimal linear transformation by solving the generalized eigenvalue problem. The feasibility of the LDI algorithm has been successfully tested in text categorization using 20NG and Reuters-21578 databases. Experiment results show LDI is an effective technique for document modeling and representations for classification.