This paper introduces a locality discriminating indexing (LDI) algorithm for text categorization. The LDI algorithm offers a manifold way of discriminant analysis. Based on the hypothesis that samples from different classes reside in class-specific manifold structures, the algorithm depicts the manifold structures by a nearest-native graph and a invader graphs. And a new locality discriminant criterion is proposed, which best preserves the within-class local structures while suppresses the between-class overlap. Using the notion of the Laplacian of the graphs, the LDI algorithm finds the optimal linear transformation by solving the generalized eigenvalue problem. The feasibility of the LDI algorithm has been successfully tested in text categorization using 20NG and Reuters-21578 databases. Experiment results show LDI is an effective technique for document modeling and representations for classification.
[1]
Fan Chung,et al.
Spectral Graph Theory
,
1996
.
[2]
Wei-Ying Ma,et al.
Locality preserving indexing for document representation
,
2004,
SIGIR '04.
[3]
Peter E. Hart,et al.
Nearest neighbor pattern classification
,
1967,
IEEE Trans. Inf. Theory.
[4]
Fabrizio Sebastiani,et al.
Machine learning in automated text categorization
,
2001,
CSUR.
[5]
Haesun Park,et al.
Generalizing discriminant analysis using the generalized singular value decomposition
,
2004,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[6]
T. Landauer,et al.
Indexing by Latent Semantic Analysis
,
1990
.