Web Text Feature Extraction with Particle Swarm Optimization

Summary The Internet continues to grow at a phenomenal rate and the amount of information on the web is overwhelming. It provides us a great deal of information resource. Due to its wide distribution, its openness and high dynamics, the resources on the web are greatly scattered and they have no unified management and structure. This greatly reduces the efficiency in using web information.Web text feature extraction is considered as the main problem in text mining. We use Vector Space Model (VSM) as the description of web text and present a novel feature extraction algorithm which is based on the improved particle swarm optimization with reverse thinking particles (PSORTP). This algorithm will greatly improve the efficiency of web texts processing.

[1]  He Zhong-shi Term selection and weighting approach based on key words in text categorization , 2006 .

[2]  Riccardo Poli,et al.  Particle Swarm Optimisation , 2011 .

[3]  Russell C. Eberhart,et al.  Parameter Selection in Particle Swarm Optimization , 1998, Evolutionary Programming.

[4]  Liu Ling An Eigenvalue Extraction Method for Chinese Texts Using Multiple Heuristic Rules , 2006 .

[5]  Maya Rupert,et al.  The Web and Complex Adaptive Systems , 2006, 20th International Conference on Advanced Information Networking and Applications - Volume 1 (AINA'06).

[6]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[7]  Ricardo Baeza-Yates,et al.  Computer Science 2 , 1994 .

[8]  Zhang Xiao Particle Swarm Optimization Algorithm with Reverse Thinking Particles , 2006 .

[9]  Saman K. Halgamuge,et al.  Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients , 2004, IEEE Transactions on Evolutionary Computation.

[10]  Toshinori Munakata,et al.  Knowledge discovery , 1999, Commun. ACM.