Optimizing the computation of n-point correlations on large-scale astronomical data

The n-point correlation functions (npcf) are powerful statistics that are widely used for data analyses in astronomy and other fields. These statistics have played a crucial role in fundamental physical breakthroughs, including the discovery of dark energy. Unfortunately, directly computing the npcf at a single value requires O(Nn) time for N points and values of n of 2, 3, 4, or even larger. Astronomical data sets can contain billions of points, and the next generation of surveys will generate terabytes of data per night. To meet these computational demands, we present a highly-tuned npcf computation code that show an order-of-magnitude speedup over current state-of-the-art. This enables a much larger 3-point correlation computation on the galaxy distribution than was previously possible. We show a detailed performance evaluation on many different architectures.

[1]  Charles Seife Illuminating the Dark Universe , 2003, Science.

[2]  Charles Seife Breakthrough of the year. Illuminating the dark universe. , 2003, Science.

[3]  Ue-Li Pen Fast power spectrum estimation , 2003 .

[4]  J. Fry,et al.  The Galaxy correlation hierarchy in perturbation theory , 1984 .

[5]  Yannick Mellier,et al.  Detection of Dark Matter Skewness in the VIRMOS-DESCART Survey: Implications for Ω0 , 2003 .

[6]  Istv'an Szapudi Introduction to Higher Order Spatial Statistics in Cosmology , 2005 .

[7]  James E. Gunn,et al.  SDSS Imaging Pipelines , 2001, SPIE Astronomical Telescopes + Instrumentation.

[8]  Henry S. Warren,et al.  Hacker's Delight , 2002 .

[9]  A. Szalay,et al.  Bias and variance of angular correlation functions , 1993 .

[10]  Ue-Li Pen,et al.  Fast n-point correlation functions and three-point lensing application , 2003, ArXiv.

[11]  Raghu Machiraju,et al.  Image Segmentation with Tensor – Based Classification of N – Point Correlation Functions , 2006 .

[12]  A. Szalay,et al.  The statistics of peaks of Gaussian random fields , 1986 .

[13]  P. J. E. Peebles,et al.  Statistical analysis of catalogs of extragalactic objects. IX. The four-point galaxy correlation function. , 1978 .

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Xiang Zhang,et al.  Fast n-point Correlation Function Approximation with Recursive Convolution for Scalar Fields , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[16]  J. Peacock,et al.  Simulations of the formation, evolution and clustering of galaxies and quasars , 2005, Nature.

[17]  F. R. Harnden,et al.  Astronomical Data Analysis Software and Systems X , 2001 .

[18]  Alexander G. Gray,et al.  High redshift detection of the integrated Sachs-Wolfe effect , 2006 .

[19]  Joel H. Saltz,et al.  Two-point correlation as a feature for histology images: Feature space structure and correlation updating , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[20]  Rainer Beck,et al.  Square kilometre array , 2010, Scholarpedia.

[21]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[22]  Franco P. Preparata,et al.  Sequencing-by-hybridization revisited: the analog-spectrum proposal , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  J. Gunn,et al.  The Sloan Digital Sky Survey , 1994, astro-ph/9412080.

[24]  L. Wasserman,et al.  Fast Algorithms and Efficient Statistics: N-Point Correlation Functions , 2000, astro-ph/0012333.

[25]  A. Kashlinsky,et al.  Large-scale structure in the Universe , 1991, Nature.

[26]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[27]  Simon D. M. White,et al.  The hierarchy of correlation functions and its relation to other measures of galaxy clustering , 1979 .

[28]  A. Szalay,et al.  A New Class of Estimators for the N-Point Correlations , 1997, astro-ph/9704241.

[29]  V. Boucher,et al.  Introducing the Dark Energy Universe Simulation Series (DEUSS) , 2010, 1002.4950.

[30]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .

[31]  Joel H. Saltz,et al.  Tensor classification of N-point correlation function features for histology tissue segmentation , 2009, Medical Image Anal..

[32]  Brian D. Ripley,et al.  Locally Finite Random Sets: Foundations for Point Process Theory , 1976 .

[33]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[34]  Phillip James Edwin Peebles,et al.  Statistical analysis of catalogs of extragalactic objects. V. Three-point correlation function for the galaxy distribution in the Zwicky catalog. , 1975 .

[35]  Salvatore Torquato,et al.  Microstructure of two-phase random media.III: The n-point matrix probability functions for fully penetrable spheres , 1983 .

[36]  Hakobyan Yeranuhi,et al.  Random Heterogeneous Materials , 2008 .

[37]  Davis,et al.  THREE-POINT CORRELATION FUNCTIONS OF SDSS GALAXIES: CONSTRAINING GALAXY-MASS BIAS , 2010, 1012.3462.

[38]  Alexander S. Szalay,et al.  Fast Cosmic Microwave Background Analyses via Correlation Functions , 2001 .

[39]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[40]  Salvatore Torquato,et al.  Microstructure of two‐phase random media. I. The n‐point probability functions , 1982 .

[41]  A. Hamilton Toward Better Ways to Measure the Galaxy Correlation Function , 1993 .

[42]  Cameron Keith McBride,et al.  Our non-Gaussian universe: Higher order correlation functions in the Sloan Digitial Sky Survey , 2010 .

[43]  S. Ahzi,et al.  3D Reconstruction of Carbon Nanotube Composite Microstructure Using Correlation Functions , 2010 .

[44]  Aidan J. Connolly,et al.  A Framework for Analyzing Massive Astrophysical Datasets on a Distributed Grid , 2007 .