Consistency of Data-driven Histogram Methods for Density Estimation and Classification

We present general sufficient conditions for the almost sureL1-consistency of histogram density estimates based on data-dependent partitions. Analogous conditions guarantee the almost-sure risk consistency of histogram classification schemes based on data-dependent partitions. Multivariate data is considered throughout. In each case, the desired consistency requires shrinking cells, subexponential growth of a combinatorial complexity measure, and sub-linear growth of the number of cells. It is not required that the cells of every partition be rectangles with sides paralles to the coordinate axis, or that each cell contain a minimum number of points. No assumptions are made concerning the common distribution of the training vectors. We apply the results to establish the consistency of several known partitioning estimates, including the kn-spacing density estimate, classifiers based on statistically equivalent blocks, and classifiers based on multivariate clustering schemes.

[1]  H. P. Annales de l'Institut Henri Poincaré , 1931, Nature.

[2]  P. Mahalanobis A Method of Fractile Graphical Analysis , 1960 .

[3]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[4]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[5]  M. Gessaman A Consistent Nonparametric Multivariate Density Estimator Based on Statistically Equivalent Blocks , 1970 .

[6]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[7]  J. V. Ryzin,et al.  A histogram method of density estimation , 1973 .

[8]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[9]  Saab Abou-Jaoudé Conditions nécessaires et suffisantes de convergence L1 en probabilité de l'histogramme pour une densité , 1976 .

[10]  Saab Abou-Jaoudé Sur la convergence L1 et L∞ de l'estimateur de la partition aléatoire pour une densité , 1976 .

[11]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[12]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[13]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[14]  R. Olshen,et al.  Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[15]  Luc Devroye,et al.  Nonparametric Density Estimation , 1985 .

[16]  L. Zhao,et al.  Almost sure L 1 -norm convergence for data-based histogram density estimates , 1987 .

[17]  L. Devroye,et al.  Nonparametric density estimation : the L[1] view , 1987 .

[18]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[20]  L. Zhao,et al.  Almost Sure $L_r$-Norm Convergence for Data-Based Histogram Density Estimates , 1991 .

[21]  Tamás Linder,et al.  Rates of convergence in the source coding theorem, in empirical quantizer design, and in universal lossy source coding , 1994, IEEE Trans. Inf. Theory.

[22]  A. Nobel Histogram regression estimation using data-dependent partitions , 1996 .

[23]  Andrew B. Nobel,et al.  Recursive partitioning to reduce distortion , 1997, IEEE Trans. Inf. Theory.