Comparison of density estimation methods for astronomical datasets

Context. Galaxies are strongly influenced by their environment. Quantifying the galaxy density is a difficult but critical step in studying the properties of galaxies. Aims. We aim to determine differences in density estimation methods and their applicability in astronomical problems. We study the performance of four density estimation techniques: k-nearest neighbors (kNN), adaptive Gaussian kernel density estimation (DEDICA), a special case of adaptive Epanechnikov kernel density estimation (MBE), and the Delaunay tessellation field estimator (DTFE). Methods. The density estimators are applied to six artificial datasets and on three astronomical datasets, the Millennium Simulation and two samples from the Sloan Digital Sky Survey. We compare the performance of the methods in two ways: first, by measuring the integrated squared error and Kullback-Leibler divergence of each of the methods with the parametric densities of the datasets (in case of the artificial datasets); second, by examining the applicability of the densities to study the properties of galaxies in relation to their environment (for the SDSS datasets). Results. The adaptive kernel based methods, especially MBE, perform better than the other methods in terms of calculating the density properly and have stronger predictive power in astronomical use cases. Conclusions. We recommend the modified Breiman estimator as a fast and reliable method to quantify the environment of galaxies.

[1]  The Bimodal Galaxy Color Distribution: Dependence on Luminosity and Environment , 2004, astro-ph/0406266.

[2]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[3]  V. Springel,et al.  Phase‐space structures – II. Hierarchical Structure Finder , 2008, 0812.0288.

[4]  J. Peacock,et al.  Simulations of the formation, evolution and clustering of galaxies and quasars , 2005, Nature.

[5]  C. Loader Bandwidth selection: classical or plug-in? , 1999 .

[6]  G. Lucia,et al.  The hierarchical formation of the brightest cluster galaxies , 2006, astro-ph/0606519.

[7]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[8]  Enn Saar,et al.  Statistics of the Galaxy Distribution , 2001 .

[9]  B. Jones,et al.  A lognormal model for the cosmological mass distribution. , 1991 .

[10]  S. Driver,et al.  The Millennium Galaxy Catalogue : morphological classification and bimodality in the colour-concentration plane , 2006, astro-ph/0602240.

[11]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[12]  A. Dressler Galaxy morphology in rich clusters: Implications for the formation and evolution of galaxies , 1980 .

[13]  M H Wilkinson,et al.  DATAPLOT: a graphical display package for bacterial morphometry and fluorimetry data. , 1995, Computer methods and programs in biomedicine.

[14]  Galaxy colour, morphology and environment in the Sloan Digital Sky Survey , 2006, astro-ph/0610171.

[15]  M. Raddick,et al.  The Fifth Data Release of the Sloan Digital Sky Survey , 2007, 0707.3380.

[16]  I. Csiszár Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems , 1991 .

[17]  D. Fabricant,et al.  The color-magnitude relation in CL 1358+62 at z=0.33: Evidence for significant evolution in the S0 population , 1998, astro-ph/9801190.

[18]  V. Narayanan,et al.  Color Separation of Galaxy Types in the Sloan Digital Sky Survey Imaging Data , 2001, astro-ph/0107201.

[19]  Donald Ervin Knuth,et al.  The Art of Computer Programming, Volume II: Seminumerical Algorithms , 1970 .

[20]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[21]  P. Schechter An analytic expression for the luminosity function for galaxies , 1976 .

[22]  Stéphane Colombi,et al.  The fully connected N-dimensional skeleton: probing the evolution of the cosmic web , 2008, ArXiv.

[23]  K. Abazajian,et al.  THE SEVENTH DATA RELEASE OF THE SLOAN DIGITAL SKY SURVEY , 2008, 0812.0649.

[24]  S. Roweis,et al.  K-Corrections and Filter Transformations in the Ultraviolet, Optical, and Near-Infrared , 2006, astro-ph/0606170.

[25]  E. Hubble,et al.  No. 427. The velocity-distance relation among extra-galactic nebulae. , 1931 .

[26]  Amina Helmi,et al.  Mapping the substructure in the Galactic halo with the next generation of astrometric satellites , 2000, astro-ph/0007166.

[27]  R. van de Weygaert,et al.  Density estimators in particle hydrodynamics DTFE versus regular SPH , 2003, astro-ph/0303071.


[29]  S. Bamford,et al.  Galaxy bimodality versus stellar mass and environment , 2006, astro-ph/0607648.

[30]  D. Steinberg,et al.  Technometrics , 2008 .

[31]  Baltimore.,et al.  Multiscale phenomenology of the cosmic web , 2010, 1007.0742.

[32]  James Stephen Marron,et al.  Comparison of data-driven bandwith selectors , 1988 .

[33]  R. Weygaert,et al.  Delaunay Tessellation Field Estimator analysis of the PSCz local Universe: density field and cosmic flow , 2007 .

[34]  Zeljko Ivezic,et al.  The Environment of Galaxies at Low Redshift , 2008, 0801.0312.

[35]  L. Lucy A numerical approach to the testing of the fission hypothesis. , 1977 .

[36]  F. Kitaura,et al.  Bayesian power-spectrum inference for large-scale structure data , 2009, 0911.2493.

[37]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[38]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[39]  J. Copas,et al.  Interpreting Kullback-Leibler divergence with the Neyman-Pearson lemma , 2006 .

[40]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[41]  F. Braglia,et al.  Flaming, bright galaxies along the filaments of A 2744 , 2007, 0705.0273.

[42]  E. al.,et al.  The Sloan Digital Sky Survey: Technical summary , 2000, astro-ph/0006396.

[43]  J. E. Felten Study of the luminosity function for field galaxies , 1977 .