Soft Classiication, A. K. A. Risk Estimation, via Penalized Log Likelihood and Smoothing Spline Analysis of Variance

We discuss a class of methods for the problem of `soft' classi cation in supervised learning. In `hard' classi cation, it is assumed that any two examples with the same attribute vector will always be in the same class, (or have the same outcome), whereas in `soft' classi cation, two examples with the same attribute vector do not necessarily have the same outcome, but the probability of a particular outcome does depend on the attribute vector. In this paper we will describe a family of methods which are well suited for the estimation of this probability. The method we describe will produce, for any value in a (reasonable) region of the attribute space, an estimate of the probability that the next example will be in class 1. Underlying these methods is an assumption that this probability varies in a smooth way (to be de ned) as the predictor variables vary. The method combines results from Penalized log likelihood estimation, Smoothing splines, and Analysis of variance to get the PSA class of methods. In the process of describing PSA we discuss some issues concerning the computation of degrees of freedom for signal, which has wider rami cations for the minimization of generalization error in machine learning. As an illustration we apply the method to the Pima-Indian Diabetes data set in the UCI Repository, and compare the results to Smith et al(1988) who used the ADAP learning algorithm on this same data set to forecast the onset of diabetes mellitus. If the probabilities we obtain are thresholded to make a hard classi cation to compare with the hard classi cation of Smith et al(1988), the results are very similar, however, the intermediate probabilities that we obtain provide useful and interpretable information on how the risk of diabetes varies with some of the risk factors.

[1]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[2]  G. Wahba Improper Priors, Spline Smoothing and the Problem of Guarding Against Model Errors in Regression , 1978 .

[3]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[4]  G. Wahba,et al.  Some New Mathematical Methods for Variational Objective Analysis Using Splines and Cross Validation , 1980 .

[5]  G. Wahba Bayesian "Confidence Intervals" for the Cross-validated Smoothing Spline , 1983 .

[6]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[7]  Ker-Chau Li,et al.  From Stein's Unbiased Risk Estimates to the Method of Generalized Cross Validation , 1985 .

[8]  G. Wahba A Comparison of GCV and GML for Choosing the Smoothing Parameter in the Generalized Spline Smoothing Problem , 1985 .

[9]  Ker-Chau Li,et al.  Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing , 1986 .

[10]  G. Wahba Partial and interaction spline models for the semiparametric estimation of functions of several variables , 1986 .

[11]  B. Yandell,et al.  Automatic Smoothing of Regression Functions in Generalized Linear Models , 1986 .

[12]  Grace Wahba,et al.  THREE TOPICS IN ILL-POSED PROBLEMS , 1987 .

[13]  F. O’Sullivan Fast Computation of Fully Automated Log-Density and Log-Hazard Estimators , 1988 .

[14]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[15]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[16]  G. Wahba Spline models for observational data , 1990 .

[17]  Chong Gu Adaptive Spline Smoothing in Non-Gaussian Regression Models , 1990 .

[18]  David W. Scott The New S Language , 1990 .

[19]  Chong Gu,et al.  Minimizing GCV/GML Scores with Multiple Smoothing Parameters via the Newton Method , 1991, SIAM J. Sci. Comput..

[20]  John E. Moody,et al.  The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems , 1991, NIPS.

[21]  Richard Lippmann,et al.  Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.

[22]  Wray L. Buntine,et al.  Bayesian Back-Propagation , 1991, Complex Syst..

[23]  Chong Gu,et al.  Cross-Validating Non-Gaussian Data , 1992 .

[24]  G. Wahba Multivariate Function and Operator Estimation, Based on Smoothing Splines and Reproducing Kernels , 1992 .

[25]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[26]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[27]  R. Tibshirani,et al.  A Strategy for Binary Description and Classification , 1992 .

[28]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[29]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[30]  Chong Gu Diagnostics for Nonparametric Regression Models with Additive Terms , 1992 .

[31]  G. Wahba,et al.  Smoothing Spline ANOVA with Component-Wise Bayesian “Confidence Intervals” , 1993 .