The Energy of Data

The energy of data is the value of a real function of distances between data in metric spaces. The name energy derives from Newton's gravitational potential energy, which is also a function of distances between physical objects. One of the advantages of working with energy functions (energy statistics) is that even if the data are complex objects, such as functions or graphs, we can use their real-valued distances for inference. Other advantages are illustrated and discussed in this review. Concrete examples include energy testing for normality, energy clustering, and distance correlation. Applications include genome studies, brain studies, and astrophysics. The direct connection between energy and mind/observations/data in this review is a counterpart of the equivalence of energy and matter/mass in Einstein's E=mc2.

[1]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[2]  N. Henze,et al.  Goodness-of-Fit Tests for the Cauchy Distribution Based on the Empirical Characteristic Function , 2000 .

[3]  Marco Cuturi,et al.  On Wasserstein Two-Sample Testing and Related Families of Nonparametric Tests , 2015, Entropy.

[4]  László Györfi,et al.  Strongly consistent nonparametric tests of conditional independence , 2012 .

[5]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[6]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[7]  Carsten Thomassen,et al.  Finite metric spaces of strictly negative type , 1998 .

[8]  Mark W. Meckes,et al.  Positive definite metric spaces , 2010, 1012.5863.

[9]  G. Székely,et al.  A CHARACTERISTIC MEASURE OF ASYMMETRY AND ITS APPLICATION FOR TESTING DIAGONAL SYMMETRY , 2001 .

[10]  S. Holmes,et al.  Measures of dependence between random vectors and tests of independence. Literature review , 2013, 1307.7383.

[11]  A. Feuerverger,et al.  A Consistent Test for Bivariate Dependence , 1993 .

[12]  Maria L. Rizzo,et al.  On the uniqueness of distance covariance , 2012 .

[13]  E. Schrödinger An Undulatory Theory of the Mechanics of Atoms and Molecules , 1926 .

[14]  The Multiparameter Fractional Brownian Motion , 2006, math/0605279.

[15]  Jing Kong,et al.  Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality , 2012, Proceedings of the National Academy of Sciences.

[16]  B. Weiss,et al.  Strong laws for L- and U-statistics , 1996 .

[17]  Xiaofeng Shao,et al.  Partial martingale difference correlation , 2015 .

[18]  Caren Marzban,et al.  Using labeled data to evaluate change detectors in a multivariate streaming environment , 2009, Signal Process..

[19]  Heping Zhang,et al.  Conditional Distance Correlation , 2015, Journal of the American Statistical Association.

[20]  Maria L. Rizzo,et al.  DISCO analysis: A nonparametric extension of analysis of variance , 2010, 1011.2288.

[21]  Gábor J. Székely,et al.  The Uncertainty Principle of Game Theory , 2007, Am. Math. Mon..

[22]  Xiaobo Guo,et al.  Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation , 2014, PloS one.

[23]  László Lovász,et al.  Large Networks and Graph Limits , 2012, Colloquium Publications.

[24]  Xiaofeng Shao,et al.  Martingale Difference Correlation and Its Use in High-Dimensional Variable Screening , 2014 .

[25]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[26]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[27]  R. Horn On necessary and sufficient conditions for an infinitely divisible distribution to be normal or degenerate , 1972 .

[28]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[29]  A. Raftery,et al.  Strictly Proper Scoring Rules, Prediction, and Estimation , 2007 .

[30]  Norbert Henze,et al.  A class of invariant consistent tests for multivariate normality , 1990 .

[31]  Akimichi Takemura,et al.  Empirical characteristic function approach to goodness-of-fit tests for the Cauchy distribution with parameters estimated by MLE or EISE , 2005 .

[32]  R. Lyons Hyperbolic Space Has Strong Negative Type , 2014, 1408.2600.

[33]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[34]  Zoran Stanić,et al.  Spectral distances of graphs , 2012 .

[35]  J. Jurecková,et al.  Nonparametric Estimate of Regression Coefficients , 1971 .

[36]  Michael A. Newton Introducing the discussion paper by Sz\'{e}kely and Rizzo , 2010 .

[37]  H. Fischer A History of the Central Limit Theorem , 2011 .

[38]  Levente Kovács,et al.  The interrelationship of HbA1c and real-time continuous glucose monitoring in children with type 1 diabetes. , 2015, Diabetes research and clinical practice.

[39]  P. Janssen,et al.  Theory of U-statistics , 1994 .

[40]  Yang Feng,et al.  A Conditional Dependence Measure with Applications to Undirected Graphical Models , 2015 .

[41]  Maria L. Rizzo,et al.  New Goodness-of-Fit Tests for Pareto Distributions , 2009 .

[42]  A. Einstein Ist die Trägheit eines Körpers von seinem Energieinhalt abhängig? [AdP 18, 639 (1905)] , 2005, Annalen der Physik.

[43]  G. Wahba Positive definite functions, Reproducing Kernel Hilbert Spaces and all that , 2014 .

[44]  Dietrich Morgenstern,et al.  Proof of a conjecture by Walter Deuber concerning the distances between points of two types in Rd , 2001, Discret. Math..

[45]  A. Buja,et al.  Inequalities and Positive-Definite Functions Arising from a Problem in Multidimensional Scaling , 1994 .

[46]  J. Neumann Zur Theorie der Gesellschaftsspiele , 1928 .

[47]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[48]  Francis Cailliez,et al.  The analytical solution of the additive constant problem , 1983 .

[49]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[50]  M. X. Zhao,et al.  Support Vector Machine combined with Distance Correlation learning for Dst forecasting during intense geomagnetic storms , 2016 .

[51]  Zhou Zhou Measuring nonlinear dependence in time‐series, a distance correlation approach , 2012 .

[52]  B. Sen,et al.  On a nonparametric notion of residual and its applications , 2014, 1409.3886.

[53]  Runze Li,et al.  Feature Screening via Distance Correlation Learning , 2012, Journal of the American Statistical Association.

[54]  Xiaoming Huo,et al.  Fast Computing for Distance Covariance , 2014, Technometrics.

[55]  K. Schilling,et al.  Toward an Efficient and Integrative Analysis of Limited-Choice Behavioral Experiments , 2012, The Journal of Neuroscience.

[56]  Jorge Rudas,et al.  A method for functional network connectivity using distance correlation , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[57]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[58]  C. Berg,et al.  Harmonic Analysis on Semigroups , 1984 .

[59]  Antanas Verikas,et al.  Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data , 2015 .

[60]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[61]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[62]  Gábor J. Székely,et al.  Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method , 2005, J. Classif..

[63]  W. Heisenberg Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik , 1927 .

[64]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.

[65]  János Komlós,et al.  On optimal matchings , 1984, Comb..

[66]  G. Székely,et al.  Extremal probabilities for Gaussian quadratic forms , 2003 .

[67]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[68]  Shelemyahu Zacks,et al.  Parametric Statistical Inference: Basic Theory and Modern Approaches , 2013 .

[69]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[70]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[71]  Bharath K. Sriperumbudur,et al.  Discussion of: Brownian distance covariance , 2009, 1010.0836.

[72]  Xiangrong Yin,et al.  Direction estimation in single-index models via distance covariance , 2013, J. Multivar. Anal..

[73]  Søren Hauberg,et al.  Geodesic exponential kernels: When curvature and linearity conflict , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  L. Szilard über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen , 1929 .

[75]  Xiangrong Yin,et al.  Sufficient Dimension Reduction via Distance Covariance , 2016 .

[76]  Ingram Olkin,et al.  Gini Regression Analysis , 1992 .

[77]  Ronan Bureau,et al.  Clustering files of chemical structures using the Székely-Rizzo generalization of Ward's method. , 2009, Journal of molecular graphics & modelling.

[78]  L. Mattner,et al.  Strict definiteness of integrals via complete monotonicity of derivatives , 1997 .

[79]  Y. Escoufier LE TRAITEMENT DES VARIABLES VECTORIELLES , 1973 .

[80]  H. Cramér On the composition of elementary errors: Second paper: Statistical applications , 1928 .

[81]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[82]  Guangyuan Yang The Energy Goodness-of-fit Test for Univariate Stable Distributions , 2012 .

[83]  J. Dueck,et al.  The affinely invariant distance correlation , 2012, 1210.2482.

[84]  Thomas E. Nichols,et al.  Multiple comparison procedures for neuroimaging genomewide association studies. , 2014, Biostatistics.

[85]  David S. Matteson,et al.  Independent Component Analysis via Distance Covariance , 2013, 1306.4911.

[86]  M. Riesz L'intégrale de Riemann-Liouville et le problème de Cauchy , 1949 .