Fast parzen window density estimator

Parzen Windows (PW) is a popular nonparametric density estimation technique. In general the smoothing kernel is placed on all available data points, which makes the algorithm computationally expensive when large datasets are considered. Several approaches have been proposed in the past to reduce the computational cost of PW either by subsampling the dataset, or by imposing a sparsity in the density model. Typically the latter requires a rather involved and complex learning process. In this paper, we propose a new simple and efficient kernel-based method for non-parametric probability density function (pdf) estimation on large datasets. We cover the entire data space by a set of fixed radii hyper-balls with densities represented by full covariance Gaussians. The accuracy and efficiency of the new estimator is verified on both synthetic dataset and large datasets of astronomical simulations of the galaxy disruption process. Experiments demonstrate that the estimation accuracy of the new estimator is comparable to that of the previous approaches but with a significant speed-up. We also show that the pdf learnt by the new estimator could used to automatically find the most matching set in large scale astronomical simulations.

[1]  L. Holmström,et al.  The Accuracy and the Computational Complexity of a Multivariate Binned Kernel Density Estimator , 2000 .

[2]  Sheng Chen,et al.  A Forward-Constrained Regression Algorithm for Sparse Kernel Density Estimation , 2008, IEEE Transactions on Neural Networks.

[3]  Kris Popat,et al.  Cluster-based probability model and its application to image and texture processing , 1997, IEEE Trans. Image Process..

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[6]  Pascal Vincent,et al.  Manifold Parzen Windows , 2002, NIPS.

[7]  Andrew W. Moore,et al.  Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery , 2005, NIPS.

[8]  James T. Kwok,et al.  Simplifying Mixture Models Through Function Approximation , 2006, IEEE Transactions on Neural Networks.

[9]  D. W. Scott,et al.  Kernel density estimation with binned data , 1985 .

[10]  C. A. Murthy,et al.  Density-Based Multiscale Data Condensation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  David A. Landgrebe,et al.  Fast Parzen Density Estimation Using Clustering-Based Branch and Bound , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Chao He,et al.  Probability Density Estimation from Optimally Condensed Data Samples , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[14]  Jacob Goldberger,et al.  Hierarchical Clustering of a Mixture Model , 2004, NIPS.

[15]  P. Guhathakurta,et al.  Investigating the Andromeda stream — II. Orbital fits and properties of the progenitor , 2006 .

[16]  A. W. McConnachie,et al.  Investigating the Andromeda stream – III. A young shell system in M31 , 2006 .