Sample Sizes When Using Multiple Linear Regression for Prediction

When using multiple regression for prediction purposes, the issue of minimum required sample size often needs to be addressed. Using a Monte Carlo simulation, models with varying numbers of independent variables were examined and minimum sample sizes were determined for multiple scenarios at each number of independent variables. The scenarios arrive from varying the levels of correlations between the criterion variable and predictor variables as well as among predictor variables. Two minimum sample sizes were determined for each scenario, a good and an excellent prediction level. The relationship between the squared multiple correlation coefficients and minimum necessary sample sizes were examined. A definite relationship, similar to a negative exponential relationship, was found between the squared multiple correlation coefficient and the minimum sample size. As the squared multiple correlation coefficient decreased, the sample size increased at an increasing rate. This study provides guidelines for sample size needed for accurate predictions.

[1]  Jan Palczewski,et al.  Monte Carlo Simulation , 2008, Encyclopedia of GIS.

[2]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[3]  S. Maxwell Sample size and multiple regression analysis. , 2000, Psychological methods.

[4]  James Algina,et al.  Cross-Validation Sample Sizes , 2000 .

[5]  James Algina,et al.  Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient , 2000, Multivariate behavioral research.

[6]  C. Mooney,et al.  Monte Carlo Simulation , 1997 .

[7]  Robert S. Barcikowski,et al.  Precision Power and Its Application to the Selection of Regression Sample Sizes. , 1996 .

[8]  G. Brooks,et al.  Precision Power Method for Selecting Regression Sample Sizes. , 1995 .

[9]  Elazar J. Pedhazur,et al.  Measurement, Design, and Analysis: An Integrated Approach , 1994 .

[10]  Edward J. Dudewicz,et al.  Modern Statistical Systems and Gpss Simulation: The First Course , 1990 .

[11]  R. Darlington,et al.  Regression and Linear Models , 1990 .

[12]  Mark E. Johnson Multivariate Statistical Simulation: Johnson/Multivariate , 1987 .

[13]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[14]  Paul Bratley,et al.  A guide to simulation (2nd ed.) , 1986 .

[15]  Paul Bratley,et al.  A guide to simulation , 1983 .

[16]  Wayne F. Cascio,et al.  Validation and statistical power: Implications for applied research. , 1978 .

[17]  Arthur L. Dudycha,et al.  A Cross-Validation Approach to Sample Size Determination for Regression Models , 1974 .

[18]  J. Elashoff,et al.  Multiple Regression in Behavioral Research. , 1975 .

[19]  Douglas E. Miller,et al.  Prediction and Statistical Overkill Revisited. , 1973 .

[20]  A. Gross How large should sample size be in a regression analysis , 1973 .