Survey on data science with population-based algorithms

This paper discusses the relationship between data science and population-based algorithms, which include swarm intelligence and evolutionary algorithms. We reviewed two categories of literature, which include population-based algorithms solving data analysis problem and utilizing data analysis methods in population-based algorithms. With the exponential increment of data, the data science, or more specifically, the big data analytics has gained increasing attention among researchers. New and more efficient algorithms should be designed to handle this massive data problem. Based on the combination of population-based algorithms and data mining techniques, we understand better the insights of data analytics, and design more efficient algorithms to solve real-world big data analytics problems. Also, the weakness and strength of population-based algorithms could be analyzed via the data analytics along the optimization process, a crucial entity in population-based algorithms.

[1]  D. Donoho 50 Years of Data Science , 2017 .

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Yuhui Shi,et al.  Particle swarm optimization based semi-supervised learning on Chinese text categorization , 2012, 2012 IEEE Congress on Evolutionary Computation.

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[6]  Yuhui Shi,et al.  Solution clustering analysis in brain storm optimization algorithm , 2013, 2013 IEEE Symposium on Swarm Intelligence (SIS).

[7]  Alex Alves Freitas,et al.  Inducing decision trees with an ant colony optimization algorithm , 2012, Appl. Soft Comput..

[8]  Ujjwal Maulik,et al.  Multiobjective Genetic Algorithm-Based Fuzzy Clustering of Categorical Attributes , 2009, IEEE Transactions on Evolutionary Computation.

[9]  Ke Tang,et al.  History-Based Topological Speciation for Multimodal Optimization , 2015, IEEE Transactions on Evolutionary Computation.

[10]  Jun He,et al.  A hybrid artificial immune system and Self Organising Map for network intrusion detection , 2008, Inf. Sci..

[11]  Alex Alves Freitas,et al.  Improving the interpretability of classification rules discovered by an ant colony algorithm , 2013, GECCO '13.

[12]  Ujjwal Maulik,et al.  Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part II , 2014, IEEE Transactions on Evolutionary Computation.

[13]  Günter Neumann,et al.  Genetic Algorithms for Data-Driven Web Question Answering , 2008, Evolutionary Computation.

[14]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining , 2009, Swarm Intelligence in Data Mining.

[15]  Alex Alves Freitas,et al.  Simpler is Better: a Novel Genetic Algorithm to Induce Compact Multi-label Chain Classifiers , 2015, GECCO.

[16]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[17]  Armando M. Leite da Silva,et al.  A Cluster and Gradient-Based Artificial Immune System Applied in Optimization Scenarios , 2012, IEEE Transactions on Evolutionary Computation.

[18]  Sheng Chen,et al.  Particle Swarm Optimization Aided Orthogonal Forward Regression for Unified Data Modeling , 2010, IEEE Transactions on Evolutionary Computation.

[19]  Junfeng Chen,et al.  Analytics on Fireworks Algorithm Solving Problems with Shifts in the Decision Space and Objective Space , 2015, Int. J. Swarm Intell. Res..

[20]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[21]  Bernhard Sendhoff,et al.  Evolutionary Complex Engineering Optimization: Opportunities and Challenges , 2013 .

[22]  David W. Scott Computing Science and Statistics : mining and modeling massive data sets in science, engineering, and business with a subtheme in environmental statistics : proceedings of the 29th symposium on the Interface, Houston, TX, May 14-17, 1997 , 1998 .

[23]  Reda Alhajj,et al.  Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining , 2008, Journal of Intelligent Information Systems.

[24]  Ying Tan,et al.  Particle Swarm Optimization Algorithms Inspired by Immunity-Clonal Mechanism and Their Applications to Spam Detection , 2010, Int. J. Swarm Intell. Res..

[25]  Graham J. Williams,et al.  Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum] , 2014, IEEE Computational Intelligence Magazine.

[26]  Bin Liu,et al.  A general algorithm scheme mixing computational intelligence with Bayesian simulation , 2013, 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI).

[27]  Xi Chen,et al.  Population model-based optimization with sequential Monte Carlo , 2013, 2013 Winter Simulations Conference (WSC).

[28]  Yuhui Shi,et al.  chapter two – Computational intelligence , 2007 .

[29]  Pedro Larrañaga,et al.  Applying genetic algorithms to search for the best hierarchical clustering of a dataset , 1999, Pattern Recognit. Lett..

[30]  Ying Tan,et al.  Artificial immune system based methods for spam filtering , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[31]  Yuhui Shi,et al.  Population Diversity of Particle Swarm Optimizer Solving Single and Multi-Objective Problems , 2012, Int. J. Swarm Intell. Res..

[32]  Ganesh K. Venayagamoorthy,et al.  Computational Intelligence in Wireless Sensor Networks: A Survey , 2011, IEEE Communications Surveys & Tutorials.

[33]  Ujjwal Maulik,et al.  A Survey of Multiobjective Evolutionary Algorithms for Data Mining: Part I , 2014, IEEE Transactions on Evolutionary Computation.

[34]  Xi Chen,et al.  Sequential Monte Carlo simulated annealing , 2013, J. Glob. Optim..

[35]  Erhan Akin,et al.  Rough particle swarm optimization and its applications in data mining , 2008, Soft Comput..

[36]  Alex Alves Freitas,et al.  Revisiting the Foundations of Artificial Immune Systems for Data Mining , 2007, IEEE Transactions on Evolutionary Computation.

[37]  David Camacho,et al.  GANY: A genetic spectral-based clustering algorithm for Large Data Analysis , 2015, 2015 IEEE Congress on Evolutionary Computation (CEC).

[38]  Jun Zhang,et al.  Clustering-Based Adaptive Crossover and Mutation Probabilities for Genetic Algorithms , 2007, IEEE Transactions on Evolutionary Computation.

[39]  Mehmet Kaya,et al.  Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules , 2006, Soft Comput..

[40]  Hisao Ishibuchi,et al.  Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems , 1997, Fuzzy Sets Syst..

[41]  Bernhard Sendhoff,et al.  Evolutionary Complex Engineering Optimization: Opportunities and Challenges [Guest Editorial] , 2013 .

[42]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[43]  Ying Tan,et al.  Prototype Generation Using Multiobjective Particle Swarm Optimization for Nearest Neighbor Classification , 2016, IEEE Transactions on Cybernetics.

[44]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[45]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[46]  Yuhui Shi,et al.  An Optimization Algorithm Based on Brainstorming Process , 2011, Int. J. Swarm Intell. Res..

[47]  Ganesh K. Venayagamoorthy,et al.  Particle Swarm Optimization in Wireless-Sensor Networks: A Brief Survey , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[48]  Qingyu Zhang,et al.  Big data analytics with swarm intelligence , 2016, Ind. Manag. Data Syst..

[49]  Mark Johnston,et al.  Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data , 2013, IEEE Transactions on Evolutionary Computation.

[50]  Alex Alves Freitas,et al.  A New Sequential Covering Strategy for Inducing Classification Rules With Ant Colony Algorithms , 2013, IEEE Transactions on Evolutionary Computation.

[51]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[52]  Russell C. Eberhart,et al.  Computational intelligence - concepts to implementations , 2007 .

[53]  Sujatha Srinivasan,et al.  Evolutionary multi objective optimization for rule mining: a review , 2011, Artificial Intelligence Review.

[54]  Mark Johnston,et al.  Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data , 2014, IEEE Transactions on Evolutionary Computation.

[55]  James C. Bezdek,et al.  Genetic algorithm guided clustering , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[56]  Sankar K. Pal,et al.  Web mining in soft computing framework: relevance, state of the art and future directions , 2002, IEEE Trans. Neural Networks.

[57]  Ying Tan,et al.  Fireworks Algorithm for Optimization , 2010, ICSI.

[58]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[59]  Francesco Folino,et al.  An Evolutionary Multiobjective Approach for Community Discovery in Dynamic Networks , 2014, IEEE Transactions on Knowledge and Data Engineering.

[60]  Ke Tang,et al.  Improving Estimation of Distribution Algorithm on Multimodal Problems by Detecting Promising Areas , 2015, IEEE Transactions on Cybernetics.

[61]  Chin-Teng Lin,et al.  An Improved Polynomial Neural Network Classifier Using Real-Coded Genetic Algorithm , 2015, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[62]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[63]  Yuhui Shi,et al.  Swarm Intelligence in Big Data Analytics , 2013, IDEAL.

[64]  Ying Tan,et al.  Fireworks Algorithm: A Novel Swarm Intelligence Optimization Method , 2015 .

[65]  David E. Goldberg,et al.  A Survey of Optimization by Building and Using Probabilistic Models , 2002, Comput. Optim. Appl..

[66]  Riccardo Poli,et al.  Analysis of the publications on the applications of particle swarm optimisation , 2008 .

[67]  Arthur K. Kordon,et al.  Pareto front genetic programming parameter selection based on design of experiments and industrial data , 2006, GECCO.

[68]  Alex Alves Freitas,et al.  Data mining with an ant colony optimization algorithm , 2002, IEEE Trans. Evol. Comput..

[69]  Antonio Iera,et al.  The Internet of Things: A survey , 2010, Comput. Networks.

[70]  Bart Baesens,et al.  Editorial survey: swarm intelligence for data mining , 2010, Machine Learning.

[71]  Grzegorz Dudek,et al.  An Artificial Immune System for Classification With Local Feature Selection , 2012, IEEE Transactions on Evolutionary Computation.

[72]  Xiaodong Li,et al.  Cooperatively Coevolving Particle Swarms for Large Scale Optimization , 2012, IEEE Transactions on Evolutionary Computation.

[73]  Mark Johnston,et al.  Developing New Fitness Functions in Genetic Programming for Classification With Unbalanced Data , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[74]  Ying Tan,et al.  A Virus Detection System Based on Artificial Immune System , 2009, 2009 International Conference on Computational Intelligence and Security.

[75]  Carlos A. Coello Coello,et al.  Swarm Intelligence for Multi-objective Problems in Data Mining , 2009 .

[76]  Arlindo Silva,et al.  A Swarm Intelligence Approach to SVM Training ? , 2013 .

[77]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[78]  Yuhui Shi,et al.  Brain Storm Optimization Algorithm , 2011, ICSI.

[79]  Inés María Galván,et al.  AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[80]  Mohd Saberi Mohamad,et al.  A Modified Binary Particle Swarm Optimization for Selecting the Small Subset of Informative Genes From Gene Expression Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[81]  Dusˇan Teodorovic,et al.  MODELING BY MULTI-AGENT SYSTEMS : A SWARM INTELLIGENCE APPROACH , 2003 .

[82]  Yunhao Liu,et al.  Long-term large-scale sensing in the forest: recent advances and future directions of GreenOrbs , 2010, Frontiers of Computer Science in China.

[83]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[84]  Jerome H. Friedman,et al.  DATA MINING AND STATISTICS: WHAT''S THE CONNECTION , 1997 .

[85]  Felix Wortmann,et al.  Internet of Things , 2015, Business & Information Systems Engineering.

[86]  Makoto Sato,et al.  Evolutionary Computation for Intelligent Agents Based on Chaotic Retrieval and Soft DNA , 1998, SEAL.

[87]  Leandro N. de Castro,et al.  Data Clustering with Particle Swarms , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[88]  Shengrui Wang,et al.  Particle swarm optimizer for variable weighting in clustering high-dimensional data , 2009, 2009 IEEE Swarm Intelligence Symposium.

[89]  Bin Liu,et al.  Posterior exploration based sequential Monte Carlo for global optimization , 2015, J. Glob. Optim..

[90]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining (Studies in Computational Intelligence) , 2006 .