High-dimensional labeled data analysis with topology representing graphs

We propose the use of topology representing graphs for the exploratory analysis of high-dimensional labeled data. The Delaunay graph contains all the topological information needed to analyze the topology of the classes (e.g. the number of separate clusters of a given class, the way these clusters are in contact with each other or the shape of these clusters). The Delaunay graph also allows to sample the decision boundary of the Nearest Neighbor rule, to define a topological criterion of non-linear separability of the classes and to find data which are near the decision boundary so that their label must be considered carefully. This graph then provides a way to analyze the complexity of a classification problem, and tools for decision support. When the Delaunay graph is not tractable in too high-dimensional spaces, we propose to use the Gabriel graph instead and discuss the limits of this approach. This analysis technique is complementary with projection techniques, as it allows to handle the data as they are in the data space, avoiding projection distortions. We apply it to analyze the well-known Iris database and a seismic events database.

[1]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. I. , 1962 .

[2]  Thomas Villmann,et al.  Generalized relevance learning vector quantization , 2002, Neural Networks.

[3]  R. Sokal,et al.  A New Statistical Approach to Geographic Variation Analysis , 1969 .

[4]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[5]  Elizabeth Bradley,et al.  Topology and intelligent data analysis , 2004, Intell. Data Anal..

[6]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[7]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[8]  Herbert Edelsbrunner,et al.  Triangulating Topological Spaces , 1997, Int. J. Comput. Geom. Appl..

[9]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[10]  Atsuyuki Okabe,et al.  Spatial Tessellations: Concepts and Applications of Voronoi Diagrams , 1992, Wiley Series in Probability and Mathematical Statistics.

[11]  Ki-Joune Li,et al.  A spatial data mining method by Delaunay triangulation , 1997, GIS '97.

[12]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[13]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[14]  James R. Munkres,et al.  Elements of algebraic topology , 1984 .

[15]  R. Pollack,et al.  Advances in Discrete and Computational Geometry , 1999 .

[16]  Steven Fortune,et al.  Voronoi Diagrams and Delaunay Triangulations , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[17]  I. Jolliffe Principal Component Analysis , 2002 .

[18]  Thomas Villmann,et al.  Rule Extraction from Self-Organizing Networks , 2002, ICANN.

[19]  Stéphanie Barbet Muller Un codage neuro-flou pour la classification de donnees incompletes ou imprecises : application a la discrimination d'evenements sismiques , 1998 .

[20]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[21]  DemartinesP.,et al.  Curvilinear component analysis , 1997 .

[22]  Fabrice Muhlenbach,et al.  A statistical approach for separability of classes , 2005 .

[23]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[24]  R. Shepard The analysis of proximities: Multidimensional scaling with an unknown distance function. II , 1962 .

[25]  Remco C. Veltkamp,et al.  The gamma-neighborhood Graph , 1992, Comput. Geom..

[26]  Pat Morin,et al.  Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries , 2005, Discret. Comput. Geom..

[27]  David C. Sterratt,et al.  Does Morphology Influence Temporal Plasticity? , 2002, ICANN.

[28]  David A. Elizondo,et al.  New methods for testing linear separability , 2002, Neurocomputing.

[29]  Ofer Melnik,et al.  Decision Region Connectivity Analysis: A Method for Analyzing High-Dimensional Classifiers , 2002, Machine Learning.

[30]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[31]  Alexander Russell,et al.  Computational topology: ambient isotopic approximation of 2-manifolds , 2003, Theor. Comput. Sci..

[32]  Monika Sester,et al.  PARAMETER-FREE CLUSTER DETECTION IN SPATIAL DATABASES AND ITS APPLICATION TO TYPIFICATION , 2000 .

[33]  Pat Morin,et al.  Output-Sensitive Algorithms for Computing Nearest-Neighbour Decision Boundaries , 2003, WADS.

[34]  Giuseppe Liotta,et al.  Proximity Drawability: a Survey , 1994, Graph Drawing.

[35]  Jan Komorowski,et al.  Principles of Data Mining and Knowledge Discovery , 2001, Lecture Notes in Computer Science.

[36]  Frank-Michael Schleif,et al.  Supervised Neural Gas and Relevance Learning in Learning Vector Quantization , 2003 .

[37]  Irwin King,et al.  A study of the relationship between support vector machine and Gabriel graph , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[38]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[39]  W. Relative Neighborhood Graphs and Their Relatives , 2004 .

[40]  Godfried T. Toussaint,et al.  Some new algorithms and software implementation methods for pattern recognition research , 1979, COMPSAC.

[41]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[42]  Herbert Edelsbrunner,et al.  Three-dimensional alpha shapes , 1992, VVS.

[43]  Herbert Edelsbrunner,et al.  Shape Reconstruction with Delaunay Complex , 1998, LATIN.

[44]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[45]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[46]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[47]  Fabrice Muhlenbach,et al.  Improving Classification by Removing or Relabeling Mislabeled Instances , 2002, ISMIS.

[48]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[49]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[50]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[51]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[52]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[53]  Gerald Sommer,et al.  Intrinsic Dimensionality Estimation With Optimally Topology Preserving Maps , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  B Fritzke,et al.  A growing neural gas network learns topologies. G. Tesauro, DS Touretzky, and TK Leen, editors , 1995, NIPS 1995.

[55]  Ickjai Lee,et al.  Criteria on Proximity Graphs for Boundary Extraction and Spatial Clustering , 2001, PAKDD.

[56]  J. Friedman,et al.  Multivariate generalizations of the Wald--Wolfowitz and Smirnov two-sample tests , 1979 .

[57]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[58]  W. Scott Spangler,et al.  Class visualization of high-dimensional data with applications , 2002, Comput. Stat. Data Anal..

[59]  Herbert Edelsbrunner,et al.  Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms , 1988, SCG '88.

[60]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[61]  Godfried T. Toussaint,et al.  Proximity Graphs for Nearest Neighbor Decision Rules: Recent Progress , 2002 .

[62]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[63]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[64]  Stephan K. Chalup,et al.  CLUSTERING THROUGH PROXIMITY GRAPH MODELLING , 2002 .

[65]  N. J. A. Sloane,et al.  Sphere Packings, Lattices and Groups , 1987, Grundlehren der mathematischen Wissenschaften.

[66]  Michaël Aupetit,et al.  gamma-Observable neighbours for vector quantization , 2002, Neural Networks.

[67]  A. Blokhuis SPHERE PACKINGS, LATTICES AND GROUPS (Grundlehren der mathematischen Wissenschaften 290) , 1989 .

[68]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[69]  MelnikOfer Decision Region Connectivity Analysis , 2002 .

[70]  D. Du,et al.  Computing in Euclidean Geometry , 1995 .

[71]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[72]  Fabrice Muhlenbach,et al.  Separability Index in Supervised Learning , 2002, PKDD.

[73]  Matthew B. Squire,et al.  A Multivariate Two-Sample Test Using the Voronoi Diagram , 2003 .