Kernel mean embedding based hypothesis tests for comparing spatial point patterns

This paper introduces an approach for detecting differences in the first-order structures of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way by introducing its approximate version tailored to spatial point processes. While the original embedding is infinite-dimensional and implicit, our approximate embedding is finite-dimensional and comes with explicit closed-form formulas. With its help we reduce the pattern comparison problem to the comparison of means in the Euclidean space. Hypothesis testing is based on conducting t-tests on each dimension of the embedding and combining the resulting p-values using one of the recently introduced p-value combination techniques. If desired, corresponding Bayes factors can be computed and averaged over all tests to quantify the evidence against the null. The main advantages of the proposed approach are that it can be applied to both single and replicated pattern comparisons and that neither bootstrap nor permutation procedures are needed to obtain or calibrate the p-values. Our experiments show that the resulting tests are powerful and the p-values are well-calibrated; two applications to real world data are presented.

[1]  L. Baringhaus,et al.  On a new multivariate two-sample test , 2004 .

[2]  Vinayak A. Rao,et al.  A Stein-Papangelou Goodness-of-Fit Test for Point Processes , 2019, AISTATS.

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  J. Mateu,et al.  A nonparametric test for the comparison of first-order structures of spatial point processes , 2017 .

[5]  Yee Whye Teh,et al.  Poisson intensity estimation with reproducing kernels , 2016, AISTATS.

[6]  Kenji Fukumizu,et al.  A Linear-Time Kernel Goodness-of-Fit Test , 2017, NIPS.

[7]  F. J. Ariza-López,et al.  On the similarity analysis of spatial patterns , 2016 .

[8]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[9]  Felipe Osorio,et al.  Effective sample size of spatial process models , 2014 .

[10]  Zaïd Harchaoui,et al.  Signal Processing , 2013, 2020 27th International Conference on Mixed Design of Integrated Circuits and System (MIXDES).

[11]  Jun Xie,et al.  Cauchy Combination Test: A Powerful Test With Analytic p-Value Calculation Under Arbitrary Dependency Structures , 2018, Journal of the American Statistical Association.

[12]  V. Vovk,et al.  Combining e-values and p-values , 2019, SSRN Electronic Journal.

[13]  R. Waagepetersen An Estimating Function Approach to Inference for Inhomogeneous Neyman–Scott Processes , 2007, Biometrics.

[14]  P. Diggle A point process modeling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point , 1990 .

[15]  James R. Thompson,et al.  Some Nonparametric Techniques for Estimating the Intensity Function of a Cancer Related Nonstationary Poisson Process , 1981 .

[16]  B. Ripley The Second-Order Analysis of Stationary Point Processes , 1976 .

[17]  Evgeny Burnaev,et al.  Quadrature-based features for kernel approximation , 2018, NeurIPS.

[18]  Virgilio Gómez-Rubio,et al.  Spatial Point Patterns: Methodology and Applications with R , 2016 .

[19]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[20]  Ute Hahn,et al.  A Studentized Permutation Test for the Comparison of Spatial Point Patterns , 2012 .

[21]  Yongtao Guan A goodness-of-fit test for inhomogeneous spatial Poisson processes , 2008 .

[22]  Martin A. Andresen Testing for similarity in area-based spatial patterns: A nonparametric Monte Carlo approach , 2009 .

[23]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[24]  E. Candès,et al.  Controlling the false discovery rate via knockoffs , 2014, 1404.5609.

[25]  Maria L. Rizzo,et al.  A new test for multivariate normality , 2005 .

[26]  Song-xi Chen,et al.  A two-sample test for high-dimensional data with applications to gene-set testing , 2010, 1002.4547.

[27]  Martin L. Hazelton,et al.  Symmetric adaptive smoothing regimens for estimation of the spatial relative risk function , 2016, Comput. Stat. Data Anal..

[28]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[29]  Tarn Duong,et al.  Closed-form density-based framework for automatic detection of cellular morphology changes , 2012, Proceedings of the National Academy of Sciences.

[30]  Kenji Fukumizu,et al.  Post Selection Inference with Incomplete Maximum Mean Discrepancy Estimator , 2018, ICLR.

[31]  Peter J. Diggle,et al.  Analysis of Variance for Replicated Spatial Point Patterns in Clinical Neuroanatomy , 1991 .

[32]  N. H. Anderson,et al.  Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates , 1994 .

[33]  Jeffrey N. Rouder,et al.  Bayesian t tests for accepting and rejecting the null hypothesis , 2009, Psychonomic bulletin & review.

[34]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[35]  J. Franklin,et al.  The elements of statistical learning: data mining, inference and prediction , 2005 .

[36]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[37]  Patrick Rubin-Delanchy,et al.  Choosing between methods of combining p-values , 2017, 1707.06897.

[38]  R. Vallejos,et al.  On the effective geographic sample size , 2018 .

[39]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[40]  Deyu Meng,et al.  FastMMD: Ensemble of Circular Discrepancy for Efficient Two-Sample Test , 2014, Neural Computation.

[41]  H. Jeffreys,et al.  Theory of probability , 1896 .

[42]  V. Vovk,et al.  Combining p-values via averaging , 2020, Biometrika.

[43]  P. Diggle,et al.  Kernel estimation of relative risk , 1995 .

[44]  Sanjiv Kumar,et al.  Orthogonal Random Features , 2016, NIPS.

[45]  Vikas Sindhwani,et al.  Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels , 2014, J. Mach. Learn. Res..

[46]  Peter Jäckel A note on multivariate Gauss-Hermite quadrature , 2005 .

[47]  Thomas M. Loughin,et al.  A systematic comparison of methods for combining p , 2004, Comput. Stat. Data Anal..

[48]  L. Klebanov,et al.  A characterization of distributions by mean values of statistics and certain probabilistic metrics , 1992 .

[49]  E. Arias-Castro,et al.  Distribution-free Multiple Testing , 2016, 1604.07520.

[50]  Zaïd Harchaoui,et al.  Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[51]  P. Billingsley,et al.  Probability and Measure , 1980 .

[52]  B. Silverman,et al.  On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[53]  P. Diggle,et al.  Non-parametric estimation of spatial variation in relative risk. , 1995, Statistics in medicine.

[54]  Tilman M. Davies,et al.  Adaptive kernel estimation of spatial relative risk , 2010, Statistics in medicine.

[55]  Daniel J. Wilson,et al.  The harmonic mean p-value for combining dependent tests , 2019, Proceedings of the National Academy of Sciences.

[56]  Peter Hall Resampling a coverage pattern , 1985 .

[57]  D. Griffith Effective Geographic Sample Size in the Presence of Spatial Autocorrelation , 2005 .

[58]  I. Good Significance Tests in Parallel and in Series , 1958 .

[59]  Tonglin Zhang,et al.  Testing proportionality between the first-order intensity functions of spatial point processes , 2017, J. Multivar. Anal..