Feature Reduction Based on Hybrid Efficient Weighted Gene Genetic Algorithms with Artificial Neural Network for Machine Learning Problems in the Big Data

A large amount of data being generated from different sources and the analyzing and extracting of useful information from these data becomes a very complex task. The difficulty of dealing with big data arises from many factors such as the high number of features, existence of lost data, and variety of data. One of the most effective solutions that used to overcome the huge amount of big data is the feature reduction process. In this paper, a set of hybrid and efficient algorithms are proposed to classify the datasets that have large feature size by merging the genetic algorithms with the artificial neural networks. The genetic algorithms are used as a prestep to significantly reduce the feature size of the analyzed data before handling that data using machine learning techniques. Reducing the number of features simplifies the task of classifying the analyzed data and enhances the performance of the machine learning algorithms that are used to extract valuable information from big data. The proposed algorithms use a new gene-weight mechanism that can significantly enhance the performance and decrease the required search time. The proposed algorithms are applied on different datasets to pick the most relative and important features before applying the artificial neural networks algorithm, and the results show that our proposed algorithms can effectively enhance the classifying performance over the tested datasets.

[1]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[2]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[3]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[4]  Huan Liu,et al.  Manipulating Data and Dimension Reduction Methods: Feature Selection , 2009, Encyclopedia of Complexity and Systems Science.

[5]  Luiz Eduardo Soares de Oliveira,et al.  Feature selection using multi-objective genetic algorithms for handwritten digit recognition , 2002, Object recognition supported by user interaction for service robots.

[6]  Václav Snásel,et al.  Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach , 2018, Swarm Evol. Comput..

[7]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[8]  Alex Pentland,et al.  Big Data and Management , 2014 .

[9]  Fakhri Karray,et al.  Multi-objective Feature Selection with NSGA II , 2007, ICANNGA.

[10]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[11]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[12]  Mengjie Zhang,et al.  Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach , 2013, IEEE Transactions on Cybernetics.

[13]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[14]  Avita Katal,et al.  Big data: Issues, challenges, tools and Good practices , 2013, 2013 Sixth International Conference on Contemporary Computing (IC3).

[15]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[16]  Ivor W. Tsang,et al.  A Feature Selection Method for Multivariate Performance Measures , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[18]  Qinghua Hu,et al.  Feature selection with test cost constraint , 2012, ArXiv.

[19]  Hossam M. Zawbaa,et al.  Feature selection via Lèvy Antlion optimization , 2018, Pattern Analysis and Applications.

[20]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[21]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[24]  Aboul Ella Hassanien,et al.  Attribute reduction approach based on modified flower pollination algorithm , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[25]  Saint John Walker Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2014 .