Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes

Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional ``principal object'': a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar (``add a node'', ``bisect an edge'') is equivalent to the construction of ``principal trees'', an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data using the ``metro map'' representation. The preprint is supplemented by animation: ``How the topological grammar constructs branching principal components (AnimatedBranchingPCA.gif)''.

[1]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[2]  S. Brenner,et al.  General Nature of the Genetic Code for Proteins , 1961, Nature.

[3]  Francis Crick,et al.  The Genetic Code for Proteins , 1963 .

[4]  General , 1970 .

[5]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[6]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[7]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[8]  Michael Löwe,et al.  Algebraic Approach to Single-Pushout Graph Transformation , 1993, Theor. Comput. Sci..

[9]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[10]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[11]  Alexander N. Gorban,et al.  Visualization of Data by Method of Elastic Maps and Its Applications in Genomics, Economics and Sociology , 2001 .

[12]  Alexander N. Gorban,et al.  The Filling of Gaps in Geophysical Time Series by Artificial Neural Networks , 2001, Radiocarbon.

[13]  Sergei Matveev,et al.  Cubic complexes and finite type invariants , 2002 .

[14]  Adam Krzyzak,et al.  Piecewise Linear Skeletonization Using Principal Curves , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Alexander N. Gorban,et al.  Self-Organizing Approach for Automated Gene Identification , 2003, Open Syst. Inf. Dyn..

[16]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[17]  Donald C. Wunsch,et al.  Application of the method of elastic maps in analysis of genetic texts , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[18]  Alexander Gorban,et al.  ELASTIC PRINCIPAL MANIFOLDS AND THEIR PRACTICAL APPLICATIONS , 2004 .

[19]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[20]  Alexander N Gorban,et al.  Four basic symmetry types in the universal 7-clusterstructure of 143 complete bacterial genomic sequences , 2004, q-bio/0410033.

[21]  D. Botstein,et al.  A DNA microarray survey of gene expression in normal human tissues , 2005, Genome Biology.

[22]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[23]  A. A. Gusev,et al.  Finite element mapping for spring network representations of the mechanics of solids. , 2004, Physical review letters.

[24]  Gerhard Tutz,et al.  Local principal curves , 2005, Stat. Comput..

[25]  Alexander N. Gorban,et al.  Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences , 2005, Physica A: Statistical Mechanics and its Applications.

[26]  Alexander N. Gorban,et al.  Elastic Principal Graphs and Manifolds and their Practical Applications , 2005, Computing.

[27]  Manfred Nagl Formal languages of labelled graphs , 2005, Computing.

[28]  Alexander N. Gorban,et al.  Topological grammars for data approximation , 2007, Appl. Math. Lett..

[29]  Alexander N Gorban,et al.  Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization , 2007, 0801.0168.

[30]  T. Hastie,et al.  Principal Curves , 2007 .