A recursive procedure for density estimation on the binary hypercube

This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For $d$ covariates, there are $2^d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error for moderate sample sizes, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity.

[1]  R. Kronmal,et al.  Some Classification Procedures for Multivariate Binary Data Using Orthogonal Functions , 1976 .

[2]  Shahar Mendelson,et al.  A Few Notes on Statistical Learning Theory , 2002, Machine Learning Summer School.

[3]  Emmanuel Lesaffre,et al.  Conditional independence of multivariate binary data with an application in caries research , 2007, Comput. Stat. Data Anal..

[4]  Sudipto Guha,et al.  Near-optimal sparse fourier representations via sampling , 2002, STOC '02.

[5]  M. Strauss GROUP TESTING IN STATISTICAL SIGNAL RECOVERY , 2006 .

[6]  Jöran Bergh,et al.  Interpolation Spaces: An Introduction , 2011 .

[7]  Leonid A. Levin,et al.  A hard-core predicate for all one-way functions , 1989, STOC '89.

[8]  Jesus M. Carro Estimating Dynamic Panel Data Discrete Choice Models with Fixed Effects , 2003 .

[9]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[10]  I. Johnstone Minimax Bayes, Asymptotic Minimax and Sparse Wavelet Priors , 1994 .

[11]  M. Talagrand On Russo's Approximate Zero-One Law , 1994 .

[12]  Terence Tao,et al.  Additive combinatorics , 2007, Cambridge studies in advanced mathematics.

[13]  Ilya Shmulevich,et al.  Binary analysis and optimization-based normalization of gene expression data , 2002, Bioinform..

[14]  R. O'Donnell,et al.  On the Fourier tails of bounded functions over the discrete cube , 2007 .

[15]  Emmanuel J. Candès,et al.  Modern statistical estimation via oracle inequalities , 2006, Acta Numerica.

[16]  J. Aitchison,et al.  Multivariate binary discrimination by the kernel method , 1976 .

[17]  Yishay Mansour,et al.  Learning Boolean Functions via the Fourier Transform , 1994 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Ryan O'Donnell,et al.  On the Fourier tails of bounded functions over the discrete cube , 2006, STOC '06.

[20]  Eyal Kushilevitz,et al.  Learning decision trees using the Fourier spectrum , 1991, STOC '91.

[21]  I. Johnstone,et al.  Density estimation by wavelet thresholding , 1996 .

[22]  Peter Hall,et al.  Numerical performance of block thresholded wavelet estimators , 1997, Stat. Comput..

[23]  Michael I. Jordan Graphical Models , 2003 .

[24]  X. Chen,et al.  Estimation of multivariate binary density using orthogonal functions , 1989 .

[25]  T. Koski,et al.  Probabilistic Models for Bacterial Taxonomy , 2000 .

[26]  Wen-Qi Liang,et al.  Nonparametric iterative estimation of multivariate binary density , 1985 .

[27]  R W Doerge,et al.  Variable Selection in High‐Dimensional Multivariate Binary Data with Application to the Analysis of Microbial Community DNA Fingerprints , 2002, Biometrics.

[28]  Yuhong Yang,et al.  Information-theoretic determination of minimax rates of convergence , 1999 .

[29]  Katherine A. Heller,et al.  Bayesian Sets , 2005, NIPS.

[30]  P. Hall,et al.  Block threshold rules for curve estimation using kernel and wavelet methods , 1998 .

[31]  A. R. Barron,et al.  Adaptive estimation of the intensity of inhomogeneous Poisson processes via concentration inequalities , 2003 .

[32]  A. Berlinet,et al.  Nonparametric Curve Estimation , 2004 .

[33]  Arno Siebes,et al.  Smoothing Categorical Data , 2012, ECML/PKDD.

[34]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[35]  H. Rosenthal On the Span in Lp of Sequences of Independent Random Variables (II) , 1972 .

[36]  Emmanuel J. Candès,et al.  Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies? , 2004, IEEE Transactions on Information Theory.