Dimensionality Reduction with Category Information Fusion and Non-negative Matrix Factorization for Text Categorization

Dimensionality reduction can efficiently improve computing performance of classifiers in text categorization, and non-negative matrix factorization could map the high dimensional term space into a low dimensional semantic subspace easily. Meanwhile, the non-negative of the basis vectors could provide a meaningful explanation for the semantic subspace. However, it usually could not achieve a satisfied classification performance because it is sensitive to the noise, data missing and outlier as a linear reconstruction method. This paper proposes a novel approach in which the train text and its category information are fused and a transformation matrix that maps the term space into a semantic subspace is obtained by a basis orthogonality non-negative matrix factorization and truncation. Finally, the dimensionality can be reduced aggressively with these transformations. Experimental results show that the proposed approach remains a good classification performance in a very low dimensional case.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Kan Li,et al.  Text Categorization Based on Topic Model , 2008, RSKT.

[3]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[4]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[5]  Emilio Corchado,et al.  Intelligent Data Engineering and Automated Learning - IDEAL 2009, 10th International Conference, Burgos, Spain, September 23-26, 2009. Proceedings , 2009, IDEAL.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[8]  Hua Zhang,et al.  Fast text categorization based on collaborative work in the semantic and class spaces , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[9]  Bernardete Ribeiro,et al.  Knowledge Extraction with Non-Negative Matrix Factorization for Text Classification , 2009, IDEAL.

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Bernt Schiele,et al.  Introducing a weighted non-negative matrix factorization for image classification , 2003, Pattern Recognit. Lett..

[12]  Yuntao Qian,et al.  Aggressive Dimensionality Reduction with Reinforcement Local Feature Selection for Text Categorization , 2010, AICI.

[13]  Kehong Yuan,et al.  Reducing microarray data via nonnegative matrix factorization for visualization and clustering analysis , 2008, J. Biomed. Informatics.

[14]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..