Performing Text Categorization on Manifold

Text categorization has become the key technology in organizing and processing the large amount of text information. It normally involves an extremely high dimensional space, which makes most existing approaches generate highly biased estimates so as to reduce the classification accuracy. These approaches do not consider that the text documents may be intrinsically located on the low-dimensional manifold. This paper presents an approach that performs text categorization on texts manifold with respect to the intrinsic global manifold structure, such as by geodesic distance to measure the distance between two texts. This approach has been applied to improve the KNN for text categorization. This is empirically validated by the conducted experiments.

[1]  Yu Shiwen,et al.  An adaptive k -nearest neighbor text categorization strategy , 2004 .

[2]  Guodong Guo,et al.  Learning from examples in the small sample case: face expression recognition , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  Jaideep Srivastava,et al.  Blocking reduction strategies in hierarchical text classification , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Wai Lam,et al.  Automatic Textual Document Categorization Based on Generalized Instance Sets and a Metamodel , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Ah-Hwee Tan,et al.  On Machine Learning Methods for Chinese Document Categorization , 2003, Applied Intelligence.

[6]  Philip S. Yu,et al.  On using partial supervision for text categorization , 2004, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xi Chen,et al.  Text classification with kernels on the multinomial manifold , 2005, SIGIR '05.

[8]  Hongwei Zhu,et al.  An adaptive fuzzy evidential nearest neighbor formulation for classifying remote sensing images , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[9]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[10]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding , 2003, ICANN.

[11]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[12]  Joshua B. Tenenbaum,et al.  The Isomap Algorithm and Topological Stability , 2002, Science.

[13]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[14]  Zhi-Hua Zhou,et al.  Supervised nonlinear dimensionality reduction for visualization and classification , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[16]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.