Histogram regression estimation using data-dependent partitions

We establish general sufficient conditions for the L 2 -conoistency of multivariate histogram regression estimates based on data-dependent partitions. These same conditions insure the consistency of partitioning regression estimates based on local polynomial fits, and, with an additional regularity assumption, the consistency of histogram estimates for conditional medians. Our conditions require shrinking cells, subexponential growth of a combinatorial complexity measure and sublinear growth of restricted cell counts. It is not assumed that the cells of every partition be rectangles with sides parallel to the coordinate axis or that each cell contain a minimum number of points. Response variables are assumed to be bounded throughout. Our results may be applied to a variety of partitioning schemes. We established the consistency of histograms regression estimates based on cubic partitions with data-dependent offsets, k-thresholding in one dimension and empirically optimal nearest-neighbor clustering schemes. In addition, it is shown that empirically optimal regression trees are consistent when the size of the trees grows with the number of samples at an appropriate rate.

[1]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[2]  P. Chaudhuri,et al.  Piecewise polynomial regression trees , 1994 .

[3]  L. Zhao,et al.  Almost Sure $L_r$-Norm Convergence for Data-Based Histogram Density Estimates , 1991 .

[4]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[5]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  R. Olshen,et al.  Almost surely consistent nonparametric regression from recursive partitioning schemes , 1984 .

[7]  D. Pollard Convergence of stochastic processes , 1984 .

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[10]  R. Olshen,et al.  Consistent nonparametric regression from recursive partitioning schemes , 1980 .

[11]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[12]  R. Olshen,et al.  Asymptotically Efficient Solutions to the Classification Problem , 1978 .

[13]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[14]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[15]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[16]  M. Gessaman A Consistent Nonparametric Multivariate Density Estimator Based on Statistically Equivalent Blocks , 1970 .

[17]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..