Multiobjective evolutionary optimization for visual data mining with virtual reality spaces: application to Alzheimer gene expressions

This paper introduces a multi-objective optimization approach to the problem of computing virtual reality spaces for the visual representation of relational structures (e.g. databases), symbolic knowledge and others, in the context of visual data mining and knowledge discovery. Procedures based on evolutionary computation are discussed. In particular, the NSGA-II algorithm is used as a framework for an instance of this methodology; simultaneously minimizing Sammon's error for dissimilarity measures, and mean cross-validation error on a k-nn pattern classifier. The proposed approach is illustrated with an example from genomics (in particular, Alzheimer's disease) by constructing virtual reality spaces resulting from multi-objective optimization. Selected solutions along the Pareto front approximation are used as nonlinearly transformed features for new spaces that compromise similarity structure preservation (from an unsupervised perspective) and class separability (from a supervised pattern recognition perspective), simultaneously. The possibility of spanning a range of solutions between these two important goals, is a benefit for the knowledge discovery and data understanding process. The quality of the set of discovered solutions is superior to the ones obtained separately, from the point ofview of visual data mining.

[1]  Julio J. Valdés,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Data Mining of Gene Expression Changes in , 2003 .

[2]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[3]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[4]  I. Borg Multidimensional similarity structure analysis , 1987 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[7]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[8]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[9]  David L. Levine,et al.  Users guide to the PGAPack parallel genetic algorithm library , 1995 .

[10]  Graham Kendall,et al.  Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques , 2013 .

[11]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[12]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  Pavel Pudil,et al.  Introduction to Statistical Pattern Recognition , 2006 .

[14]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[15]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[16]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[17]  Julio J. Valdés,et al.  Virtual Reality Representation of Information Systems and Decision Rules: An Exploratory Technique for Understanding Data and Knowledge Structure , 2003, RSFDGrC.

[18]  Anil K. Jain,et al.  Artificial neural network for nonlinear projection of multivariate data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[19]  Husheng Yang,et al.  Discriminant Analysis by Neural Networks , 2006 .

[20]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[21]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.