Link based BPSO for feature selection in big data text clustering

Abstract Feature selection is a significant task in data mining and machine learning applications which eliminates irrelevant and redundant features and improves learning performance. This paper proposes a new feature selection method for unsupervised text clustering named link based particle swarm optimization (LBPSO). This method introduces a new neighbour selection strategy in BPSO to select prominent features. The performance of traditional particle swarm optimization(PSO)is limited by using global best updating mechanism for feature selection. Instead of using global best, LBPSO particles are updated based on neighbour best position to enhance the exploitation and exploration capability. These prominent features are then tested using k -means clustering algorithm to improve the performance and reduce the cost of computational time of the proposed algorithm. The performance of LBPSO are investigated on three published datasets, namely Reuter 21578, TDT2 and Tr11. Our results based on evaluation measures show that the performance of LBPSO is superior than other PSO based algorithms.

[1]  Laith Mohammad Abualigah,et al.  Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering , 2017, The Journal of Supercomputing.

[2]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[3]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[4]  Millie Pant,et al.  Magnetic optimization algorithm for data clustering , 2017, Pattern Recognit. Lett..

[5]  Fei Wang,et al.  Fast affinity propagation clustering: A multilevel approach , 2012, Pattern Recognit..

[6]  Ali Selamat,et al.  Web page feature selection and classification using neural networks , 2004, Inf. Sci..

[7]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[8]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[9]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[10]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[11]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[12]  Rafael Bello,et al.  Two-Step Particle Swarm Optimization to Solve the Feature Selection Problem , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[13]  Haris Vikalo,et al.  Semi-Supervised Affinity Propagation with Soft Instance-Level Constraints , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Chonghui Guo,et al.  Incremental Affinity Propagation Clustering Based on Message Passing , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Mohammed Azmi Al-Betar,et al.  Unsupervised Text Feature Selection Technique Based on Particle Swarm Optimization Algorithm for Improving the Text Clustering , 2017 .

[16]  Yi-Hung Huang,et al.  Feature selection based on an improved cat swarm optimization algorithm for big data classification , 2016, The Journal of Supercomputing.

[17]  Takio Kurita,et al.  Selection of Import Vectors via Binary Particle Swarm Optimization and Cross-Validation for Kernel Logistic Regression , 2007, 2007 International Joint Conference on Neural Networks.

[18]  Ling Zheng,et al.  Self-adjusting harmony search-based feature selection , 2014, Soft Computing.

[19]  Lichao Cao,et al.  Improved particle swarm optimization algorithm and its application in text feature selection , 2015, Appl. Soft Comput..

[20]  Pramod Kumar Singh,et al.  Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering , 2016, Appl. Soft Comput..

[21]  Jianchu Kang,et al.  A comparative study on unsupervised feature selection methods for text clustering , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[22]  Haider Banka,et al.  A Hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation , 2015, Pattern Recognit. Lett..

[23]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  Zhi-Qiang Liu,et al.  A COMPETITIVE NEURAL NETWORK APPROACH TO WEB-PAGE CATEGORIZATION , 2001 .

[25]  A. Barabasi,et al.  Scale-free characteristics of random networks: the topology of the world-wide web , 2000 .

[26]  Dik Lun Lee,et al.  Feature reduction for neural network based text categorization , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[27]  Soon Myoung Chung,et al.  Text document clustering based on neighbors , 2009, Data Knowl. Eng..

[28]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[29]  Dun-Wei Gong,et al.  Feature selection algorithm based on bare bones particle swarm optimization , 2015, Neurocomputing.

[30]  George D. C. Cavalcanti,et al.  Data-driven global-ranking local feature selection methods for text categorization , 2015, Expert Syst. Appl..

[31]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[32]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[33]  Mohammed Azmi Al-Betar,et al.  Multi-objectives-based text clustering technique using K-mean algorithm , 2016, 2016 7th International Conference on Computer Science and Information Technology (CSIT).

[34]  Li-Yeh Chuang,et al.  Gene selection and classification using Taguchi chaotic binary particle swarm optimization , 2011, Expert Syst. Appl..

[35]  Nasser Ghasem-Aghaee,et al.  Text feature selection using ant colony optimization , 2009, Expert Syst. Appl..

[36]  Soon Myoung Chung,et al.  A parallel text document clustering algorithm based on neighbors , 2015, Cluster Computing.

[37]  Reza Malekian,et al.  A method for driving route predictions based on hidden Markov model , 2015 .

[38]  Jenq Haur Wang,et al.  Incremental Neural Network Construction for Text Classification , 2014, 2014 International Symposium on Computer, Consumer and Control.

[39]  Hui Wang,et al.  Diversity enhanced particle swarm optimization with neighborhood search , 2013, Inf. Sci..

[40]  Wai Lam,et al.  Automatic Text Categorization and Its Application to Text Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[41]  Hongnian Yu,et al.  Parameters optimization of classifier and feature selection based on improved artificial bee colony algorithm , 2016, 2016 International Conference on Advanced Mechatronic Systems (ICAMechS).

[42]  Kuan-Cheng Lin,et al.  Feature Selection and Parameter Optimization of Support Vector Machines Based on Modified Artificial Fish Swarm Algorithms , 2015 .

[43]  Mohammed Azmi Al-Betar,et al.  comprehensive review : Krill Herd algorithm ( KH ) and its pplications saju , 2016 .