Cluster analysis of massive datasets in astronomy

Abstract Clusters of galaxies are a useful proxy to trace the distribution of mass in the universe. By measuring the mass of clusters of galaxies on different scales, one can follow the evolution of the mass distribution (Martínez and Saar, Statistics of the Galaxy Distribution, 2002). It can be shown that finding galaxy clusters is equivalent to finding density contour clusters (Hartigan, Clustering Algorithms, 1975): connected components of the level set Sc≡{f>c} where f is a probability density function. Cuevas et al. (Can. J. Stat. 28, 367–382, 2000; Comput. Stat. Data Anal. 36, 441–459, 2001) proposed a nonparametric method for density contour clusters, attempting to find density contour clusters by the minimal spanning tree. While their algorithm is conceptually simple, it requires intensive computations for large datasets. We propose a more efficient clustering method based on their algorithm with the Fast Fourier Transform (FFT). The method is applied to a study of galaxy clustering on large astronomical sky survey data.

[1]  M. Lachièze‐Rey,et al.  Statistics of the galaxy distribution , 1989 .

[2]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[3]  A. Cuevas,et al.  On boundary estimation , 2004, Advances in Applied Probability.

[4]  N. Kaiser Clustering in real space and in redshift space , 1987 .

[5]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[6]  A. Cuevas,et al.  Estimating the number of clusters , 2000 .

[7]  Giri Narasimhan,et al.  Experiments with Computing Geometric Minimum Spanning Trees , 2000 .

[8]  L. Devroye,et al.  Detection of Abnormal Behavior Via Nonparametric Estimation of the Support , 1980 .

[9]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[10]  D. Parkinson,et al.  Bayesian model selection analysis of WMAP3 , 2006, astro-ph/0605003.

[11]  William H. Press,et al.  Formation of Galaxies and Clusters of Galaxies by Self-Similar Gravitational Condensation , 1974 .

[12]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[13]  Christopher J. Miller,et al.  Nonparametric Inference for the Cosmic Microwave Background , 2004, astro-ph/0410140.

[14]  Jeremiah P. Ostriker,et al.  The Cluster Mass Function from Early Sloan Digital Sky Survey Data: Cosmological Implications , 2002, astro-ph/0205490.

[15]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[16]  Ravi Sheth,et al.  Halo Models of Large Scale Structure , 2002, astro-ph/0206508.

[17]  Liverpool John Moores University,et al.  A Deficit of High-Redshift, High-Luminosity X-Ray Clusters: Evidence for a High Value of Ωm? , 1998, astro-ph/9802153.

[18]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[19]  Matt P. Wand,et al.  On the Accuracy of Binned Kernel Density Estimators , 1994 .

[20]  Don R. Hush,et al.  A Classification Framework for Anomaly Detection , 2005, J. Mach. Learn. Res..

[21]  B. Silverman,et al.  Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[22]  B. Silverman,et al.  Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform , 1982 .

[23]  A. P. Korostelev,et al.  MiniMax Methods for Image Reconstruction , 1993 .

[24]  J. A. Cuesta-Albertos,et al.  Convergence rates in nonparametric estimation of level sets , 2001 .

[25]  R. Wilson Modern Cosmology , 2004 .

[26]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[27]  A. J. Connolly,et al.  Computational AstroStatistics: Fast Algorithms and Efficient Statistics for Density Estimation in Large Astronomical Datasets , 2000 .

[28]  M. Wand Fast Computation of Multivariate Kernel Estimators , 1994 .

[29]  Woncheol Jang,et al.  Nonparametric density estimation and clustering in astronomical sky surveys , 2006, Comput. Stat. Data Anal..

[30]  Andrew W. Moore,et al.  Rapid Evaluation of Multiple Density Models , 2003, AISTATS.

[31]  Woncheol Jang Nonparametric Density Estimation and Galaxy Clustering , 2003 .

[32]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[33]  Werner Stuetzle,et al.  Estimating the Cluster Tree of a Density by Analyzing the Minimal Spanning Tree of a Sample , 2003, J. Classif..

[34]  Eric A. Hansen,et al.  A Breadth-First Approach to Memory-Efficient Graph Search , 2006, AAAI.

[35]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[36]  S. George Djorgovski,et al.  Some statistical and computational challenges, and opportunities in astronomy , 2004 .

[37]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[38]  Rebecca Willett,et al.  Minimax optimal level set estimation , 2005, SPIE Optics + Photonics.

[39]  Christopher R. Genovese,et al.  Nonparametric Confidence Sets for Density , 2004 .

[40]  Avishai Dekel,et al.  Stochastic Nonlinear Galaxy Biasing , 1998, astro-ph/9806193.

[41]  Shaun Cole,et al.  Mock 2dF and SDSS galaxy redshift surveys , 1998 .