Search for the best decision rules with the help of a probabilistic estimate

The problem of how to find the best decision rule in the course of a search, based on sample set analysis, is considered. Specifically the problem of selecting the best subset of regressors is highlighted. In the considered formulation the problem is an important case of how to learn a dependence by examples.The concept of Predictive Probabilistic Estimate (PPE) is introduced and its properties are discussed. Asymptotic properties of PPE are based on Vapnik-Chervonenkis theory of uniform convergence of a set of sample estimates. Finite-sample-size properties of PPE demonstrate how PPE takes into account the presence of a search process and the complexity of a regression formula, while estimating quality of fit. Some practical and model examples are presented.

[1]  V. L. Brailovsky A Predictive Probabilistic Estimate for Selecting Subsets of Regressor Variables , 1987 .

[2]  G. Diehr,et al.  Approximating the Distribution of the Sample R 2 in Best Subset Regressions , 1974 .

[3]  K. Berk Comparing Subset Regression Procedures , 1978 .

[4]  David Haussler,et al.  Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension , 1986, STOC '86.

[5]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[6]  Victor L. Brailovsky,et al.  On use of predictive probabilistic estimates for selecting best decision rules in the course of a search , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Vladimir Vapnik Estimations of dependences based on statistical data , 1982 .

[8]  J. W. Gorman,et al.  Selection of Variables for Fitting Equations to Data , 1966 .

[9]  G. Diehr,et al.  Approximating the Distribution of the Sample R2 in Best Subset Regressions , 1974 .

[10]  Anders Hald,et al.  Statistical Theory with Engineering Applications , 1952 .

[11]  V. Flack,et al.  Frequency of Selecting Noise Variables in Subset Regression Analysis: A Simulation Study , 1987 .

[12]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[13]  Alan B. Forsythe,et al.  A Stopping Rule for Variable Selection in Multiple Regression , 1973 .

[14]  Robert G. Miller Statistical prediction by discriminant analysis , 1962 .

[15]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[16]  A. C. Rencher,et al.  Inflation of R2 in Best Subset Regression , 1980 .

[17]  V. L. Brailovsky A probabilistic estimate of clustering , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[18]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[19]  Murray Aitkin,et al.  Simultaneous Inference and the Choice of Variable Subsets in Multiple Regression , 1974 .

[20]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..