Text categorization based on combination of modified back propagation neural network and latent semantic analysis

This paper proposed a new text categorization model based on the combination of modified back propagation neural network (MBPNN) and latent semantic analysis (LSA). The traditional back propagation neural network (BPNN) has slow training speed and is easy to trap into a local minimum, and it will lead to a poor performance and efficiency. In this paper, we propose the MBPNN to accelerate the training speed of BPNN and improve the categorization accuracy. LSA can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimension but also discovers the important associative relationship between terms. We test our categorization model on 20-newsgroup corpus and reuter-21578 corpus, experimental results show that the MBPNN is much faster than the traditional BPNN. It also enhances the performance of the traditional BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

[1]  Chuanyi Ji,et al.  A unified approach on fast training of feedforward and recurrent networks using EM algorithm , 1998, IEEE Trans. Signal Process..

[2]  Guo-An Chen,et al.  Acceleration of backpropagation learning using optimised learning rate and momentum , 1993 .

[3]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[4]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[5]  Lei Tang,et al.  Using Maximum Entropy Model for Chinese Text Categorization , 2004, APWeb.

[6]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[7]  Wei Wu,et al.  Deterministic convergence of an online gradient method for BP neural networks , 2005, IEEE Transactions on Neural Networks.

[8]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[9]  宋宁,et al.  A Fuzzy Approach to Classification of Text Documents , 2003 .

[10]  Guy W. Mineau,et al.  A simple KNN algorithm for text categorization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[11]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[12]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[13]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[14]  Satarupa Banerjee,et al.  A neuro-SVM model for text classification using latent semantic indexing , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[15]  Yiming Yang,et al.  Noise reduction in a statistical approach to text categorization , 1995, SIGIR '95.

[16]  Padmini Srinivasan,et al.  Automatic Text Categorization Using Neural Networks , 1997 .

[17]  Dik Lun Lee,et al.  Feature reduction for neural network based text categorization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[18]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[19]  Jun'ichi Tsujii,et al.  Maximum Entropy Models with Inequality Constraints: A Case Study on Text Categorization , 2005, Machine Learning.

[20]  Muh-Cherng Wu,et al.  An effective application of decision tree to stock trading , 2006, Expert Syst. Appl..

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Sang-Bum Kim,et al.  Effective Methods for Improving Naive Bayes Text Classifiers , 2002, PRICAI.

[23]  Wei-Ying Ma,et al.  Supervised latent semantic indexing for document categorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[24]  Minoru Nakayama,et al.  Subject Categorization for Web Educational Resources using MLP , 2003, ESANN.

[25]  Yanchun Zhang,et al.  Enhancing text classification using synopses extraction , 2003, Proceedings of the Fourth International Conference on Web Information Systems Engineering, 2003. WISE 2003..

[26]  Hu Yunfa,et al.  Using Maximum Entropy Model for Chinese Text Categorization , 2005 .