Particle swarm optimization based semi-supervised learning on Chinese text categorization

For many large scale learning problems, acquiring a large amount of labeled training data is expensive and time-consuming. Semi-supervised learning is a machine learning paradigm which deals with utilizing unlabeled data to build better classifiers. However, unlabeled data with wrong predictions will mislead the classifier. In this paper, we proposed a particle swarm optimization based semi-learning classifier to solve Chinese text categorization problem. This classifier utilizes an iterative strategy, and the result of classifier is determined by a document's previous prediction and its neighbors' information. The new classifier is tested on a Chinese text corpus. The proposed classifier is compared with the k nearest neighbor method, the k weighted nearest neighbor method, and the self-learning classifier.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[3]  Rudolf Kruse,et al.  Enhancing Text Classification to Improve Information Filtering , 2001 .

[4]  Xiaojin Zhu,et al.  New directions in semi-supervised learning , 2010 .

[5]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[6]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[7]  Cong Wang,et al.  Feature selection for Chinese Text Categorization based on improved particle swarm optimization , 2010, Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010).

[8]  Bernhard Sendhoff,et al.  A systems approach to evolutionary multiobjective structural optimization and beyond , 2009, IEEE Computational Intelligence Magazine.

[9]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[10]  Mingyan Jiang,et al.  Chinese text mining based on distributed SMO , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[11]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[12]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[13]  Neil Genzlinger A. and Q , 2006 .

[14]  Yuhui Shi,et al.  Experimental Study on Boundary Constraints Handling in Particle Swarm Optimization: From Population Diversity Perspective , 2011, Int. J. Swarm Intell. Res..

[15]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[16]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[17]  Russell C. Eberhart,et al.  Computational intelligence - concepts to implementations , 2007 .

[18]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[19]  Russell C. Eberhart,et al.  Recent advances in particle swarm , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[20]  Inés María Galván,et al.  AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).