On the Schoenberg Transformations in Data Analysis: Theory and Illustrations

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.

[1]  René L. Schilling,et al.  Bernstein Functions: Theory and Applications , 2010 .

[2]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[3]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[4]  D. Alpay,et al.  On the characteristics of a class of Gaussian processes within the white noise space setting , 2009, 0909.4267.

[5]  C. Cuadras,et al.  The Proximity of an Individual to a Population with Applications in Discriminant Analysis , 1997 .

[6]  S. Bernstein,et al.  Sur les fonctions absolument monotones , 1929 .

[7]  Rajendra Bhatia,et al.  Infinitely Divisible Matrices , 2006, Am. Math. Mon..

[8]  G. Christakos On the Problem of Permissible Covariance and Variogram Models , 1984 .

[9]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[10]  Jorge Mateu,et al.  The Dagum family of isotropic correlation functions , 2007, 0705.0456.

[11]  L. Lebart,et al.  Statistique exploratoire multidimensionnelle , 1995 .

[12]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[13]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[14]  S. Joly,et al.  Étude des puissances d'une distance , 1986 .

[15]  Xizhao Wang,et al.  On linear separability of data sets in feature space , 2007, Neurocomputing.

[16]  François Bavaud Spectral Clustering and Multidimensional Scaling: A Unified View , 2006, Data Science and Classification.

[17]  G.B.M Heuvelink Interpolation of Spatial Data: Some Theory for Kriging: M.L. Stein, Springer, New York, 1999. Hardcover, 247 pp., US$ 49.95, ISBN 0-387-98629-4 , 2000 .

[18]  Frank Critchley,et al.  The partial order by inclusion of the principal classes of dissimilarity on a finite set, and some of their basic properties , 1994 .

[19]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Jorge Mateu,et al.  The Dagum family of isotropic correlation , 2008 .

[22]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[23]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[24]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[25]  Leonard M. Blumenthal,et al.  Theory and applications of distance geometry , 1954 .

[26]  Tomás Aluja,et al.  Book review: Multiple correspondence analysis and related methods. Greenacre, M. and Blasius, J. Chapman & Hall/CRC, 2006. , 2006 .

[27]  N. Campbell Robust Procedures in Multivariate Analysis I: Robust Covariance Estimation , 1980 .

[28]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[30]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[31]  David Kaplan,et al.  The Sage handbook of quantitative methodology for the social sciences , 2004 .

[32]  I. J. Schoenberg,et al.  Fourier integrals and metric geometry , 1941 .

[33]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[34]  Christopher K. I. Williams On a Connection between Kernel PCA and Metric Multidimensional Scaling , 2004, Machine Learning.

[35]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[36]  A. D. Gordon,et al.  Correspondence Analysis Handbook. , 1993 .

[37]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[38]  R. Horn,et al.  On fractional Hadamard powers of positive definite matrices*1, *2 , 1977 .

[39]  François Bavaud,et al.  Aggregation invariance in general clustering approaches , 2009, Adv. Data Anal. Classif..

[40]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[41]  Carles M. Cuadras,et al.  Weighted continuous metric scaling , 1996 .

[42]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[43]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[44]  J. Gower Euclidean Distance Geometry , 1982 .

[45]  Willem J. Heiser,et al.  Principal Components Analysis With Nonlinear Optimal Scaling Transformations for Ordinal and Nominal Data , 2005 .

[46]  I. J. Schoenberg On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[47]  C. Micchelli Interpolation of scattered data: Distance matrices and conditionally positive definite functions , 1986 .

[48]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[49]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .