A PSO-Based Web Document Classification Algorithm

Due to the exponential growth of documents in the Internet and the emergent need to organize them, the automatic document classification has received an ever-increased attention in the recent years. The particle swarm optimization (PSO) algorithm, new to the document classification community, is a robust stochastic evolutionary algorithm based on the movement and intelligence of swarms. In this paper, a PSO-based algorithm for document classification is presented. Comparison between our method and other conventional document classification algorithms is conducted on Reuter and TREC corpora. The experimental results indicate that our proposed algorithm yields much better performance than other conventional algorithms.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[3]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[4]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[5]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[6]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[7]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[8]  Félix de Moya Anegón,et al.  Document organization using Kohonen's algorithm , 2002, Inf. Process. Manag..

[9]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[10]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[11]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[12]  James Kennedy,et al.  The particle swarm: social adaptation of knowledge , 1997, Proceedings of 1997 IEEE International Conference on Evolutionary Computation (ICEC '97).

[13]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.