Assessing Influence in Variable Selection Problems

Abstract Variable selection techniques are often used in combination with multiple linear regression to produce a parsimonious model that fits the data well. It is clearly undesirable for the final model to depend strongly on the inclusion of a few influential cases in the data set. This article discusses a measure of influence of single cases on the final model, based on a similar measure used in ordinary multiple regression. When variables are selected objectively, deletion of individual cases can strongly affect the choice of model. The influence of individual cases on the parameters of the selected model are often assessed as part of the model building process. However, such conditional measures fail to evaluate the influence of the cases on the variable selection process. Modern computing environments make it feasible to use an unconditional criterion to determine the influence of each case on the selection procedure. A number of examples are discussed to illustrate the differences between these appr...

[1]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[2]  M. Thompson Selection of Variables in Multiple Regression: Part I. A Review and Evaluation , 1978 .

[3]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[4]  Javier Ruiz-Castillo,et al.  Robust Methods of Building Regression Models-An Application to the Housing Sector , 1984 .

[5]  A. Hossain,et al.  A comparative study on detection of influential observations in linear regression , 1991 .

[6]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[7]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[8]  R. D. Cook,et al.  Transformations and Influential Cases in Regression , 1983 .

[9]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[10]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[11]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[12]  Ali S. Hadi,et al.  Impact of simultaneous omission of a variable and an observation on a linear regression equation , 1988 .

[13]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[14]  Sanford Weisberg,et al.  A Statistic for Allocating C p to Individual Cases , 1981 .

[15]  M. Thompson Selection of Variables in Multiple Regression: Part II. Chosen Procedures, Computations and Examples , 1978 .

[16]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[17]  L. S. Feldt,et al.  THE SELECTION OF VARIABLES IN MULTIPLE REGRESSION ANALYSIS , 1970 .