A new discrete particle swarm algorithm applied to attribute selection in a bioinformatics data set

Many data mining applications involve the task of building a model for predictive classification. The goal of such a model is to classify examples (records or data instances) into classes or categories of the same type. The use of variables (attributes) not related to the classes can reduce the accuracy and reliability of a classification or prediction model. Superuous variables can also increase the costs of building a model - particularly on large data sets. We propose a discrete Particle Swarm Optimization (PSO) algorithm designed for attribute selection. The proposed algorithm deals with discrete variables, and its population of candidate solutions contains particles of different sizes. The performance of this algorithm is compared with the performance of a standard binary PSO algorithm on the task of selecting attributes in a bioinformatics data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes.

[1]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[2]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[3]  Russell C. Eberhart,et al.  Parameter Selection in Particle Swarm Optimization , 1998, Evolutionary Programming.

[4]  Jürgen Branke,et al.  Multi-swarm Optimization in Dynamic Environments , 2004, EvoWorkshops.

[5]  T. Krink,et al.  Extending particle swarm optimisers with self-organized criticality , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[6]  Riccardo Poli,et al.  Exploring extended particle swarms: a genetic programming approach , 2005, GECCO '05.

[7]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[8]  Marius M. Solomon,et al.  Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints , 1987, Oper. Res..

[9]  Alex Alves Freitas,et al.  A Genetic Algorithm for Solving a Capacitated p-Median Problem , 2004, Numerical Algorithms.

[10]  I. Miller Probability, Random Variables, and Stochastic Processes , 1966 .

[11]  Dr. Alex A. Freitas Data Mining and Knowledge Discovery with Evolutionary Algorithms , 2002, Natural Computing Series.

[12]  Martin Middendorf,et al.  A Hierarchical Particle Swarm Optimizer for Dynamic Optimization Problems , 2004, EvoWorkshops.

[13]  Ian Witten,et al.  Data Mining , 2000 .

[14]  Yan Su,et al.  A Particle Swarm Optimisation Approach in the Construction of Optimal Risky Portfolios , 2005, Artificial Intelligence and Applications.

[15]  Alex Alves Freitas,et al.  Predicting post-synaptic activity in proteins with data mining , 2005, ECCB/JBI.

[16]  James Kennedy,et al.  Small worlds and mega-minds: effects of neighborhood topology on particle swarm performance , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).