Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization

Principal manifolds are defined as lines or surfaces passing through ``the middle'' of data distribution. Linear principal manifolds (Principal Components Analysis) are routinely used for dimension reduction, noise filtering and data visualization. Recently, methods for constructing non-linear principal manifolds were proposed, including our elastic maps approach which is based on a physical analogy with elastic membranes. We have developed a general geometric framework for constructing ``principal objects'' of various dimensions and topologies with the simplest quadratic form of the smoothness penalty which allows very effective parallel implementations. Our approach is implemented in three programming languages (C++, Java and Delphi) with two graphical user interfaces (VidaExpert this http URL and ViMiDa this http URL applications). In this paper we overview the method of elastic maps and present in detail one of its major applications: the visualization of microarray data in bioinformatics. We show that the method of elastic maps outperforms linear PCA in terms of data approximation, representation of between-point distance structure, preservation of local point neighborhood and representing point classes in low-dimensional spaces.

[1]  Xueguang Shao,et al.  Protein-ligand recognition using spherical harmonic molecular surfaces: towards a fast and efficient filter for large virtual throughput screening. , 2002, Journal of molecular graphics & modelling.

[2]  B. Kégl,et al.  Principal curves: learning, design, and applications , 2000 .

[3]  Alexander N. Gorban,et al.  Recovering data gaps through neural network methods , 2002 .

[4]  Donald C. Wunsch,et al.  Application of the method of elastic maps in analysis of genetic texts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[5]  Alexander N. Gorban,et al.  Visualization of Data by Method of Elastic Maps and Its Applications in Genomics, Economics and Sociology , 2001 .

[6]  Adam Krzyzak,et al.  A Polygonal Line Algorithm for Constructing Principal Curves , 1998, NIPS.

[7]  A. Lumsdaine,et al.  A Sparse Matrix Library in C + + for High PerformanceArchitectures , 1994 .

[8]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[9]  A. A. Gusev,et al.  Finite element mapping for spring network representations of the mechanics of solids. , 2004, Physical review letters.

[10]  Adrian E. Raftery,et al.  Finding Curvilinear Features in Spatial Point Patterns: Principal Curve Clustering with Noise , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  D. Botstein,et al.  A DNA microarray survey of gene expression in normal human tissues , 2005, Genome Biology.

[12]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[13]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[14]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[15]  D.,et al.  ICE FLOE IDENTIFICATION IN SATELLITE IMAGES USING MATHEMATICAL MORPHOLOGY AND CLUSTERING ABOUT PRINCIPAL CURVES , .

[16]  Adrian E. Raftery,et al.  Principal Curve Clustering With Noise , 1997 .

[17]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[18]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[19]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[21]  Alexander N. Gorban,et al.  Self-Organizing Approach for Automated Gene Identification , 2003, Open Syst. Inf. Dyn..

[22]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[23]  J. Wilhelms,et al.  SIMULATION OF ELASTIC MEMBRANES AND SOFT TISSUE WITH TRIANGULATED SPRING MESHES , 1997 .

[24]  H. Dehesh,et al.  Practical applications of TCM , 1990, IEEE Conference on Military Communications.

[25]  D. Kendall A Survey of the Statistical Theory of Shape , 1989 .

[26]  Alexander N. Gorban,et al.  Elastic Principal Graphs and Manifolds and their Practical Applications , 2005, Computing.

[27]  Donald C. Wunsch,et al.  Neural network modeling of data with gaps: method of principal curves Carleman's formula, and other , 2003, ArXiv.

[28]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[29]  Hujun Yin,et al.  Data visualisation and manifold mapping using the ViSOM , 2002, Neural Networks.

[30]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[31]  Alexander N Gorban,et al.  Invariant grids for reaction kinetics , 2003 .

[32]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[33]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[34]  Hong Qin,et al.  A physics-based framework for subdivision surface design with automatic rules control , 2002, 10th Pacific Conference on Computer Graphics and Applications, 2002. Proceedings..

[35]  Hujun Yin,et al.  Nonlinear Multidimensional Data Projection and Visualisation , 2003, IDEAL.

[36]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[37]  A. N. Gorban,et al.  Constructive methods of invariant manifolds for kinetic problems , 2003 .

[38]  Bernhard Schölkopf,et al.  Regularized Principal Manifolds , 1999, J. Mach. Learn. Res..

[39]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[40]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[41]  T. Hastie Principal Curves and Surfaces , 1984 .

[42]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[43]  Ben J. A. Kröse,et al.  A k-segments algorithm for finding principal curves , 2002, Pattern Recognit. Lett..

[44]  Helge Ritter,et al.  Parametrized Self-Organizing Maps , 1993 .

[45]  Alexander Gorban,et al.  ELASTIC PRINCIPAL MANIFOLDS AND THEIR PRACTICAL APPLICATIONS , 2004 .

[46]  M. Born,et al.  Dynamical Theory of Crystal Lattices , 1954 .

[47]  R D Meyer,et al.  Visualization of data. , 2000, Current opinion in biotechnology.

[48]  M. Fréchet Les éléments aléatoires de nature quelconque dans un espace distancié , 1948 .

[49]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[50]  R. Tibshirani,et al.  Adaptive Principal Surfaces , 1994 .

[51]  Lev Abramovich Aĭzenberg Carleman’s Formulas in Complex Analysis: Theory and Applications , 1993 .

[52]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[53]  L. Aĭzenberg,et al.  Carleman’s Formulas in Complex Analysis , 1993 .

[54]  T. Hastie,et al.  Principal Curves , 2007 .

[55]  Alexander N. Gorban,et al.  Topological grammars for data approximation , 2007, Appl. Math. Lett..

[56]  Adam Krzyzak,et al.  Learning and Design of Principal Curves , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Alexander N. Gorban,et al.  The Filling of Gaps in Geophysical Time Series by Artificial Neural Networks , 2001, Radiocarbon.