Benchmarking Manifold Learning Methods on a Large Collection of Datasets

Manifold learning, a non-linear approach of dimensionality reduction, assumes that the dimensionality of multiple datasets is artificially high and a reduced number of dimensions is sufficient to maintain the information about the data. In this paper, a large scale comparison of manifold learning techniques is performed for the task of classification. We show the current standing of genetic programming (GP) for the task of classification by comparing the classification results of two GP-based manifold leaning methods: GP-Mal and ManiGP - an experimental manifold learning technique proposed in this paper. We show that GP-based methods can more effectively learn a manifold across a set of 155 different problems and deliver more separable embeddings than many established methods.

[1]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[2]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[3]  C. R. Rao,et al.  The Utilization of Multiple Measurements in Problems of Biological Classification , 1948 .

[4]  Krzysztof Boryczko,et al.  Parallel Approach for Visual Clustering of Protein Databases , 2012 .

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[7]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[8]  B. Nadler,et al.  Diffusion maps, spectral clustering and reaction coordinates of dynamical systems , 2005, math/0503445.

[9]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.

[10]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[11]  Mengjie Zhang,et al.  Can Genetic Programming Do Manifold Learning Too? , 2019, EuroGP.

[12]  Christine Guillemot,et al.  A study of the classification of low-dimensional data with supervised manifold learning , 2015, J. Mach. Learn. Res..

[13]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[14]  Deli Zhao,et al.  Laplacian PCA and Its Applications , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Jason H. Moore,et al.  Where are we now?: a large benchmark study of recent symbolic regression methods , 2018, GECCO.

[16]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[17]  Jing Wang,et al.  MLLE: Modified Locally Linear Embedding Using Multiple Weights , 2006, NIPS.

[18]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[19]  Barbara Hammer,et al.  Data visualization by nonlinear dimensionality reduction , 2015, WIREs Data Mining Knowl. Discov..

[20]  Simone A. Ludwig,et al.  Improving genetic programming classification for binary and multiclass datasets , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[21]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  Paul Scheunders,et al.  Non-linear dimensionality reduction techniques for unsupervised feature extraction , 1998, Pattern Recognit. Lett..

[24]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[25]  Christine Guillemot,et al.  Out-of-Sample Generalizations for Supervised Manifold Learning for Classification , 2015, IEEE Transactions on Image Processing.

[26]  Lei Tian,et al.  A genetic-algorithm-based selective principal component analysis (GA-SPCA) method for high-dimensional data feature extraction , 2003, IEEE Trans. Geosci. Remote. Sens..

[27]  Zoubin Ghahramani,et al.  Unifying linear dimensionality reduction , 2014, 1406.0873.

[28]  Krzysztof J. Cios,et al.  Multi-objective genetic programming for feature extraction and data visualization , 2015, Soft Computing.

[29]  Kilian Q. Weinberger,et al.  An Introduction to Nonlinear Dimensionality Reduction by Maximum Variance Unfolding , 2006, AAAI.

[30]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[31]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[32]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[35]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[37]  Francisco Herrera,et al.  A Survey on the Application of Genetic Programming to Classification , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Leonardo Vanneschi,et al.  Multidimensional genetic programming for multiclass classification , 2019, Swarm Evol. Comput..

[39]  C. Spearman General intelligence Objectively Determined and Measured , 1904 .

[40]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[41]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[42]  H. Zha,et al.  Principal manifolds and nonlinear dimensionality reduction via tangent space alignment , 2004, SIAM J. Sci. Comput..

[43]  Mark Johnston,et al.  Genetic Programming for Classification with Unbalanced Data , 2010, EuroGP.

[44]  Qing Zhang,et al.  Feature extraction and dimensionality reduction by genetic programming based on the Fisher criterion , 2008, Expert Syst. J. Knowl. Eng..

[45]  Marc Parizeau,et al.  DEAP: a python framework for evolutionary algorithms , 2012, GECCO '12.

[46]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .