A faster algorithm to estimate multiresolution densities

This paper develops a consistent estimator for coefficients of probability density functions defined in Multiresolution Analysis Structures ( MRD ), and an algorithm based on the proposed estimator. This algorithm, named FD , behaves similarly to the maximum likelihood estimator for large datasets. The process, by which the coefficients estimated by the FD algorithm and, then, used to estimate the MRD on a regular point grid, is called Multiresolution Density Estimation ( MRDE ) and leads to consistent MRD estimations. Simulations trials reveal that the FD algorithm based on a Frequency Data Count is faster and easier to apply than the Expectation Maximization ( EM ). The research also shows that using the same data and grid, the MRDE is frequently faster than the Kernel Density Estimation using Fast Fourier Transform algorithm $$(KDE_{FFT})$$ ( K D E FFT ) . These results suggest the MRDE method for estimating Multiresolution densities could be applied to estimate probability densities in the big data field.

[1]  A. Izenman Recent Developments in Nonparametric Density Estimation , 1991 .

[2]  Linyuan Li Nonparametric adaptive density estimation on random fields using wavelet method , 2015 .

[3]  Karthik Kashinath,et al.  A fast and objective multidimensional kernel density estimation method: fastKDE , 2016, Comput. Stat. Data Anal..

[4]  Isha Dewan,et al.  Nonparametric estimation of a quantile density function by wavelet methods , 2016, Comput. Stat. Data Anal..

[5]  Venkata Swamy Martha Big Data Processing Algorithms , 2015 .

[6]  Multivariate wavelet-based density estimation with size-biased data , 2015 .

[7]  Gilles Celeux,et al.  EM for mixtures , 2015, Stat. Comput..

[8]  G. Walter Approximation of the delta function by wavelets , 1992 .

[9]  F. Palacios-GonzÁLez,et al.  Mixtures of Mixtures Based on Multiresolution Analysis Theory , 2014, Commun. Stat. Simul. Comput..

[10]  Artur Gramacki FFT-Based Algorithms for Kernel Density Estimation and Bandwidth Selection , 2018 .

[11]  Seongjoo Song,et al.  A quantile estimation for massive data with generalized Pareto distribution , 2012, Comput. Stat. Data Anal..

[12]  Andrea De Mauro,et al.  What is big data? A consensual definition and a review of key research topics , 2015, AIP Conference Proceedings.

[13]  I. D. Feis,et al.  Wavelet density estimation for weighted data , 2014 .

[14]  G. Weiss,et al.  A First Course on Wavelets , 1996 .

[15]  Omid Chatrabgoun,et al.  Copula Density Estimation Using Multiwavelets Based on the Multiresolution Analysis , 2016, Commun. Stat. Simul. Comput..

[17]  Fei Chen,et al.  Incremental quantile estimation for massive tracking , 2000, KDD '00.

[18]  Manish Chauhan,et al.  Multivariate box spline wavelets in higher-dimensional Sobolev spaces , 2018, Journal of Inequalities and Applications.

[19]  Francis X. Diebold,et al.  Advances in Economics and Econometrics: “Big Data” Dynamic Factor Models for Macroeconomic Measurement and Forecasting: A Discussion of the Papers by Lucrezia Reichlin and by Mark W. Watson , 2003 .

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[21]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[22]  Brani Vidakovic,et al.  Wavelet-based random densities , 2000, Comput. Stat..

[23]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.

[24]  Douglas L. Jones,et al.  Real-valued fast Fourier transform algorithms , 1987, IEEE Trans. Acoust. Speech Signal Process..

[25]  Dimitris Karlis,et al.  Improving the EM algorithm for mixtures , 1999, Stat. Comput..

[26]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[27]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  K. Dziedziul,et al.  Multiresolution analysis and adaptive estimation on a sphere using stereographic wavelets , 2018, Nonlinear Analysis.

[30]  D. Pollard A User's Guide to Measure Theoretic Probability by David Pollard , 2001 .

[31]  F. Palacios-González,et al.  A flexible family of density functions , 2015 .

[32]  Harrison H. Zhou,et al.  A data-driven block thresholding approach to wavelet estimation , 2009, 0903.5147.

[33]  Dennis K. J. Lin,et al.  Single-pass low-storage arbitrary quantile estimation for massive datasets , 2003, Stat. Comput..

[34]  C. J. Stone,et al.  A study of logspline density estimation , 1991 .

[35]  Sholom M. Weiss,et al.  Predictive data mining - a practical guide , 1997 .

[36]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[37]  Data-Based Resolution Selection in Positive Wavelet Density Estimation , 2005 .

[38]  C. Kooperberg,et al.  Logspline density estimation for binned data , 2000 .

[39]  Artur Gramacki,et al.  FFT-based fast bandwidth selector for multivariate kernel density estimation , 2015, Comput. Stat. Data Anal..

[40]  H. Bierens Advances in Econometrics: Kernel estimators of regression functions , 1987 .

[41]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[42]  Brani Vidakovic,et al.  Estimating the square root of a density via compactly supported wavelets , 1997 .

[43]  Gregory Beylkin,et al.  On computing distributions of products of random variables via Gaussian multiresolution analysis , 2016, Applied and Computational Harmonic Analysis.

[44]  J. Jensen Sur les fonctions convexes et les inégalités entre les valeurs moyennes , 1906 .