Density estimation trees

In this paper we develop density estimation trees (DETs), the natural analog of classification trees and regression trees, for the task of density estimation. We consider the estimation of a joint probability density function of a d-dimensional random vector X and define a piecewise constant estimator structured as a decision tree. The integrated squared error is minimized to learn the tree. We show that the method is nonparametric: under standard conditions of nonparametric density estimation, DETs are shown to be asymptotically consistent. In addition, being decision trees, DETs perform automatic feature selection. They empirically exhibit the interpretability, adaptability and feature selection properties of supervised decision trees while incurring slight loss in accuracy over other nonparametric density estimators. Hence they might be able to avoid the curse of dimensionality if the true density is sparse in dimensions. We believe that density estimation trees provide a new tool for exploratory data analysis with unique capabilities.

[1]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[5]  J. Friedman A tree-structured approach to nonparametric multiple regression , 1979 .

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Alexander G. Gray,et al.  Retrofitting Decision Tree Classifiers Using Kernel Density Estimation , 1995, ICML.

[10]  Ron Kohavi,et al.  Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid , 1996, KDD.

[11]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[14]  James E. Gunn,et al.  SDSS Imaging Pipelines , 2001, SPIE Astronomical Telescopes + Instrumentation.

[15]  Andrew W. Moore,et al.  Nonparametric Density Estimation: Toward Computational Tractability , 2003, SDM.

[16]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[17]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  S. Dudoit,et al.  Tree-based multivariate regression and density estimation with right-censored data , 2004 .

[20]  Giles Hooker Diagnosing extrapolation: tree-based density estimation , 2004, KDD '04.

[21]  Jayanta Basak,et al.  Interpretable hierarchical clustering by constructing an unsupervised decision tree , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Larry A. Wasserman,et al.  Rodeo: Sparse Nonparametric Regression in High Dimensions , 2005, NIPS.

[23]  Eibe Frank,et al.  Unsupervised Discretization Using Tree-Based Density Estimation , 2005, PKDD.

[24]  L. Wasserman All of Nonparametric Statistics , 2005 .

[25]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[26]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[27]  Angel R. Martinez,et al.  Computational Statistics Handbook with MATLAB, Second Edition (Chapman & Hall/Crc Computer Science & Data Analysis) , 2007 .

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Larry A. Wasserman,et al.  Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo , 2007, AISTATS.

[30]  Alexander G. Gray,et al.  Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method , 2008, NIPS.

[31]  Alexander G. Gray,et al.  Massive-Scale Kernel Discriminant Analysis: Mining for Quasars , 2008, SDM.

[32]  William B. March,et al.  Linear-time Algorithms for Pairwise Statistical Problems , 2009, NIPS.

[33]  Alexander G. Gray,et al.  Ovarian cancer detection from metabolomic liquid chromatography/mass spectrometry data by support vector machines , 2009, BMC Bioinformatics.

[34]  Ira Assent,et al.  Indexing density models for incremental learning and anytime classification on data streams , 2009, EDBT '09.

[35]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[36]  W. Wong,et al.  Optional P\'{o}lya tree and Bayesian inference , 2010, 1010.0490.