Hybrid Unsupervised/Supervised Virtual Reality Spaces for Visualizing Cancer Databases: An Evolutionary Computation Approach

This paper introduces a multi-objective optimization approach to the problem of computing virtual reality spaces for the visual representation of relational structures (e.g. databases), symbolic knowledge and others, in the context of visual data mining and knowledge discovery. Procedures based on evolutionary computation are discussed. In particular, the NSGA-II algorithm is used as a framework for an instance of this methodology; simultaneously minimizing Sammon's error for dissimilarity measures, and mean cross-validation error on a k-nn pattern classifier. The proposed approach is illustrated with an example from cancer genomics data (e.g. lung cancer) by constructing virtual reality spaces resulting from multi-objective optimization. Selected solutions along the Pareto front approximation are used as nonlinearly transformed features for new spaces that compromise similarity structure preservation (from an unsupervised perspective) and class separability (from a supervised pattern recognition perspective), simultaneously. The possibility of spanning a range of solutions between these two important goals, is a benefit for the knowledge discovery and data understanding process. The quality of the set of discovered solutions is superior to the ones obtained separately, from the point of view of visual data mining.

[1]  Anil K. Jain,et al.  Discriminant analysis neural networks , 1993, IEEE International Conference on Neural Networks.

[2]  Anil K. Jain,et al.  Artificial neural network for nonlinear projection of multivariate data , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  David L. Levine,et al.  Users guide to the PGAPack parallel genetic algorithm library , 1995 .

[4]  Graham Kendall,et al.  Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques , 2013 .

[5]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  David Lowe,et al.  The optimised internal representation of multilayer classifier networks performs nonlinear discriminant analysis , 1990, Neural Networks.

[7]  Anil K. Jain,et al.  Artificial neural networks for feature extraction and multivariate data projection , 1995, IEEE Trans. Neural Networks.

[8]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[9]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[10]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[11]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[12]  Julio J. Valdés,et al.  Permission is granted to quote short excerpts and to reproduce figures and tables from this report, provided that the source of such material is fully acknowledged. Data Mining of Gene Expression Changes in , 2003 .

[13]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[14]  Charles Gide,et al.  Cours d'économie politique , 1911 .

[15]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[16]  Timothy Masters,et al.  Advanced algorithms for neural networks: a C++ sourcebook , 1995 .

[17]  B. Celli,et al.  Gene expression profiling of human lung tissue from smokers with severe emphysema. , 2004, American journal of respiratory cell and molecular biology.

[18]  Julio J. Valdés,et al.  Virtual Reality Representation of Information Systems and Decision Rules: An Exploratory Technique for Understanding Data and Knowledge Structure , 2003, RSFDGrC.

[19]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[20]  I. Borg Multidimensional similarity structure analysis , 1987 .

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[23]  Xin Yao,et al.  Parallel Problem Solving from Nature PPSN VI , 2000, Lecture Notes in Computer Science.

[24]  Zbigniew Michalewicz,et al.  Handbook of Evolutionary Computation , 1997 .