Can Genetic Programming Do Manifold Learning Too?

Exploratory data analysis is a fundamental aspect of knowledge discovery that aims to find the main characteristics of a dataset. Dimensionality reduction, such as manifold learning, is often used to reduce the number of features in a dataset to a manageable level for human interpretation. Despite this, most manifold learning techniques do not explain anything about the original features nor the true characteristics of a dataset. In this paper, we propose a genetic programming approach to manifold learning called GP-MaL which evolves functional mappings from a high-dimensional space to a lower dimensional space through the use of interpretable trees. We show that GP-MaL is competitive with existing manifold learning algorithms, while producing models that can be interpreted and re-used on unseen data. A number of promising future directions of research are found in the process.

[1]  Kate Smith-Miles Exploratory data analysis , 2011 .

[2]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[3]  Riccardo Poli,et al.  A Field Guide to Genetic Programming , 2008 .

[4]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[5]  Mengjie Zhang,et al.  Automatically evolving difficult benchmark feature selection datasets with genetic programming , 2018, GECCO.

[6]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[8]  Mengjie Zhang,et al.  A Filter Approach to Multiple Feature Construction for Symbolic Learning Classifiers Using Genetic Programming , 2012, IEEE Transactions on Evolutionary Computation.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[10]  Mengjie Zhang,et al.  New Representations in Genetic Programming for Feature Construction in k-Means Clustering , 2017, SEAL.

[11]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Kay Chen Tan,et al.  Visualizing the Evolution of Computer Programs for Genetic Programming [Research Frontier] , 2018, IEEE Computational Intelligence Magazine.

[15]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[16]  Mengjie Zhang,et al.  Genetic programming for feature construction and selection in classification on high-dimensional data , 2016, Memetic Comput..

[17]  Zhang Yi,et al.  Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations , 2017, IEEE Transactions on Evolutionary Computation.

[18]  Mengjie Zhang,et al.  A Particle Swarm Optimization-Based Flexible Convolutional Autoencoder for Image Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[19]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[20]  Hugo Jair Escalante,et al.  Structurally Layered Representation Learning: Towards Deep Learning Through Genetic Programming , 2018, EuroGP.

[21]  Krzysztof J. Cios,et al.  Multi-objective genetic programming for feature extraction and data visualization , 2015, Soft Computing.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.