Applied Linear Regression

Preface.1 Scatterplots and Regression.1.1 Scatterplots.1.2 Mean Functions.1.3 Variance Functions.1.4 Summary Graph.1.5 Tools for Looking at Scatterplots.1.5.1 Size.1.5.2 Transformations.1.5.3 Smoothers for the Mean Function.1.6 Scatterplot Matrices.Problems.2 Simple Linear Regression.2.1 Ordinary Least Squares Estimation.2.2 Least Squares Criterion.2.3 Estimating sigma 2.2.4 Properties of Least Squares Estimates.2.5 Estimated Variances.2.6 Comparing Models: The Analysis of Variance.2.6.1 The F-Test for Regression.2.6.2 Interpreting p-values.2.6.3 Power of Tests.2.7 The Coefficient of Determination, R2.2.8 Confidence Intervals and Tests.2.8.1 The Intercept.2.8.2 Slope.2.8.3 Prediction.2.8.4 Fitted Values.2.9 The Residuals.Problems.3 Multiple Regression.3.1 Adding a Term to a Simple Linear Regression Model.3.1.1 Explaining Variability.3.1.2 Added-Variable Plots.3.2 The Multiple Linear Regression Model.3.3 Terms and Predictors.3.4 Ordinary Least Squares.3.4.1 Data and Matrix Notation.3.4.2 Variance-Covariance Matrix of e.3.4.3 Ordinary Least Squares Estimators.3.4.4 Properties of the Estimates.3.4.5 Simple Regression in Matrix Terms.3.5 The Analysis of Variance.3.5.1 The Coefficient of Determination.3.5.2 Hypotheses Concerning One of the Terms.3.5.3 Relationship to the t -Statistic.3.5.4 t-Tests and Added-Variable Plots.3.5.5 Other Tests of Hypotheses.3.5.6 Sequential Analysis of Variance Tables.3.6 Predictions and Fitted Values.Problems.4 Drawing Conclusions.4.1 Understanding Parameter Estimates.4.1.1 Rate of Change.4.1.2 Signs of Estimates.4.1.3 Interpretation Depends on Other Terms in the Mean Function.4.1.4 Rank Deficient and Over-Parameterized Mean Functions.4.1.5 Tests.4.1.6 Dropping Terms.4.1.7 Logarithms.4.2 Experimentation Versus Observation.4.3 Sampling from a Normal Population.4.4 More on R2.4.4.1 Simple Linear Regression and R2.4.4.2 Multiple Linear Regression.4.4.3 Regression through the Origin.4.5 Missing Data.4.5.1 Missing at Random.4.5.2 Alternatives.4.6 Computationally Intensive Methods.4.6.1 Regression Inference without Normality.4.6.2 Nonlinear Functions of Parameters.4.6.3 Predictors Measured with Error.Problems.5 Weights, Lack of Fit, and More.5.1 Weighted Least Squares.5.1.1 Applications of Weighted Least Squares.5.1.2 Additional Comments.5.2 Testing for Lack of Fit, Variance Known.5.3 Testing for Lack of Fit, Variance Unknown.5.4 General F Testing.5.4.1 Non-null Distributions.5.4.2 Additional Comments.5.5 Joint Confidence Regions.Problems.6 Polynomials and Factors.6.1 Polynomial Regression.6.1.1 Polynomials with Several Predictors.6.1.2 Using the Delta Method to Estimate a Minimum or a Maximum.6.1.3 Fractional Polynomials.6.2 Factors.6.2.1 No Other Predictors.6.2.2 Adding a Predictor: Comparing Regression Lines.6.2.3 Additional Comments.6.3 Many Factors.6.4 Partial One-Dimensional Mean Functions.6.5 Random Coefficient Models.Problems.7 Transformations.7.1 Transformations and Scatterplots.7.1.1 Power Transformations.7.1.2 Transforming Only the Predictor Variable.7.1.3 Transforming the Response Only.7.1.4 The Box and Cox Method.7.2 Transformations and Scatterplot Matrices.7.2.1 The 1D Estimation Result and Linearly Related Predictors.7.2.2 Automatic Choice of Transformation of Predictors.7.3 Transforming the Response.7.4 Transformations of Nonpositive Variables.Problems.8 Regression Diagnostics: Residuals.8.1 The Residuals.8.1.1 Difference Between e and e.8.1.2 The Hat Matrix.8.1.3 Residuals and the Hat Matrix with Weights.8.1.4 The Residuals When the Model Is Correct.8.1.5 The Residuals When the Model Is Not Correct.8.1.6 Fuel Consumption Data.8.2 Testing for Curvature.8.3 Nonconstant Variance.8.3.1 Variance Stabilizing Transformations.8.3.2 A Diagnostic for Nonconstant Variance.8.3.3 Additional Comments.8.4 Graphs for Model Assessment.8.4.1 Checking Mean Functions.8.4.2 Checking Variance Functions.Problems.9 Outliers and Influence.9.1 Outliers.9.1.1 An Outlier Test.9.1.2 Weighted Least Squares.9.1.3 Significance Levels for the Outlier Test.9.1.4 Additional Comments.9.2 Influence of Cases.9.2.1 Cook's Distance.9.2.2 Magnitude of Di .9.2.3 Computing Di .9.2.4 Other Measures of Influence.9.3 Normality Assumption.Problems.10 Variable Selection.10.1 The Active Terms.10.1.1 Collinearity.10.1.2 Collinearity and Variances.10.2 Variable Selection.10.2.1 Information Criteria.10.2.2 Computationally Intensive Criteria.10.2.3 Using Subject-Matter Knowledge.10.3 Computational Methods.10.3.1 Subset Selection Overstates Significance.10.4 Windmills.10.4.1 Six Mean Functions.10.4.2 A Computationally Intensive Approach.Problems.11 Nonlinear Regression.11.1 Estimation for Nonlinear Mean Functions.11.2 Inference Assuming Large Samples.11.3 Bootstrap Inference.11.4 References.Problems.12 Logistic Regression.12.1 Binomial Regression.12.1.1 Mean Functions for Binomial Regression.12.2 Fitting Logistic Regression.12.2.1 One-Predictor Example.12.2.2 Many Terms.12.2.3 Deviance.12.2.4 Goodness-of-Fit Tests.12.3 Binomial Random Variables.12.3.1 Maximum Likelihood Estimation.12.3.2 The Log-Likelihood for Logistic Regression.12.4 Generalized Linear Models.Problems.Appendix.A.1 Web Site.A.2 Means and Variances of Random Variables.A.2.1 E Notation.A.2.2 Var Notation.A.2.3 Cov Notation.A.2.4 Conditional Moments.A.3 Least Squares for Simple Regression.A.4 Means and Variances of Least Squares Estimates.A.5 Estimating E(Y |X) Using a Smoother.A.6 A Brief Introduction to Matrices and Vectors.A.6.1 Addition and Subtraction.A.6.2 Multiplication by a Scalar.A.6.3 Matrix Multiplication.A.6.4 Transpose of a Matrix.A.6.5 Inverse of a Matrix.A.6.6 Orthogonality.A.6.7 Linear Dependence and Rank of a Matrix.A.7 Random Vectors.A.8 Least Squares Using Matrices.A.8.1 Properties of Estimates.A.8.2 The Residual Sum of Squares.A.8.3 Estimate of Variance.A.9 The QR Factorization.A.10 Maximum Likelihood Estimates.A.11 The Box-Cox Method for Transformations.A.11.1 Univariate Case.A.11.2 Multivariate Case.A.12 Case Deletion in Linear Regression.References.Author Index.Subject Index.

[1]  N. Mantel Why Stepdown Procedures in Variable Selection , 1970 .

[2]  Margaret J. Robertson,et al.  Design and Analysis of Experiments , 2006, Handbook of statistics.

[3]  Edward Leamer,et al.  The Set of Weighted Regression Estimates , 1983 .

[4]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[5]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[6]  H. Takeda,et al.  s-dependence of proton fragmentation by hadrons. II. Incident laboratory momenta 30--250 GeV/c , 1978 .

[7]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[8]  D. Freedman A Note on Screening Regression Equations , 1983 .

[9]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[10]  C. E. Rogers,et al.  Symbolic Description of Factorial Models for Analysis of Variance , 1973 .

[11]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[12]  K. Pearson,et al.  ON THE LAWS OF INHERITANCE IN MAN I. INHERITANCE OF PHYSICAL CHARACTERS , 1903 .

[13]  Philip Prescott,et al.  On the Accuracy of Bonferroni Significance Levels for Detecting Outliers in Linear Models , 1981 .

[14]  R. Fisher,et al.  STUDIES IN CROP VARIATION , 2009 .

[15]  J. Tukey One Degree of Freedom for Non-Additivity , 1949 .

[16]  F. David,et al.  Statistical Estimates and Transformed Beta-Variables. , 1960 .

[17]  P. Raven,et al.  Species Number and Endemism: The Gal�pagos Archipelago Revisited , 1973, Science.

[18]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[19]  Richard A. Johnson,et al.  The Large-Sample Behavior of Transformations to Normality , 1980 .

[20]  C. Baes,et al.  Effect of Dissolved Sulphur on the Surface Tension of Liquid Copper , 1953 .

[21]  C. Burt The genetic determination of differences in intelligence: a study of monozygotic twins reared together and apart. , 1966, British journal of psychology.

[22]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[23]  F. Eicker Limit Theorems for Regressions with Unequal and Dependent Errors , 1967 .

[24]  David A. Freedman,et al.  A Nonstochastic Interpretation of Reported Significance Levels , 1983 .

[25]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[26]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[27]  C. R. Deboor,et al.  A practical guide to splines , 1978 .

[28]  Chan‐Fu Chen,et al.  Score Tests for Regression Models , 1983 .

[29]  J. G. Saw A conservative test for the concurrence of several regression lines and related problems. , 1966, Biometrika.

[30]  W. Stanley Jevons,et al.  On the Condition of the Metallic Currency of the United Kingdom, with Reference to the Question of International Coinage , 1868 .

[31]  S. S. Stevens,et al.  A metric for the social consensus. , 1966, Science.

[32]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[33]  R. D. Tuddenham,et al.  Physical growth of California boys and girls from birth to eighteen years. , 1954, Publications in child development. University of California, Berkeley.

[34]  Stanley L. Sclove,et al.  Improved Estimators for Coefficients in Linear Regression , 1968 .

[35]  F. Eicker Asymptotic Normality and Consistency of the Least Squares Estimators for Families of Linear Regressions , 1963 .

[36]  I NICOLETTI,et al.  The Planning of Experiments , 1936, Rivista di clinica pediatrica.

[37]  Anders Hald,et al.  Statistical Theory with Engineering Applications , 1952 .

[38]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[39]  A. C. Rencher,et al.  Inflation of R2 in Best Subset Regression , 1980 .

[40]  S. Gould ALLOMETRY AND SIZE IN ONTOGENY AND PHYLOGENY , 1966, Biological reviews of the Cambridge Philosophical Society.

[41]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[42]  Michael O. Finkelstein The Judicial Reception of Multiple Regression Studies in Race and Sex Discrimination Cases , 1980 .

[43]  John W. Tukey,et al.  Data Analysis and Regression: A Second Course in Statistics , 1977 .

[44]  T. Breurch,et al.  A simple test for heteroscedasticity and random coefficient variation (econometrica vol 47 , 1979 .

[45]  J. Royston Expected Normal Order Statistics (Exact and Approximate) , 1982 .

[46]  Karl A. Fox,et al.  Methods of Correlation and Regression Analysis, Linear and Curvilinear , 1959 .

[47]  Typical Laws of Heredity , 1877, Nature.

[48]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[49]  J. Tukey,et al.  Transformations Related to the Angular and the Square Root , 1950 .

[50]  L. Seidlein,et al.  A quantitative theory of organic growth (Inquitiesom growth laws II) , 1938 .

[51]  Brian L. Joiner,et al.  Lurking Variables: Some Examples , 1981 .

[52]  J. Royston An Extension of Shapiro and Wilk's W Test for Normality to Large Samples , 1982 .

[53]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[54]  J. D. Forbes XIV.—Further Experiments and Remarks on the Measurement of Heights by the Boiling Point of Water , 2022, Transactions of the Royal Society of Edinburgh.

[55]  Kerr L White,et al.  The Hawthorne experiments. , 1943, Nursing times.

[56]  S. Weisberg,et al.  Diagnostics for heteroscedasticity in regression , 1983 .

[57]  J. Simpson,et al.  Rainfall Results, 1970-1975: Florida Area Cumulus Experiment , 1977, Science.

[58]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[59]  A. H. Thompson ENGLISH ROMANESQUE ARCHITECTURE AFTER THE CONQUEST. By A. W. Clapham , F.S.A., Oxford: the Clarendon Press, 1934. pp. xvi, 180, 44 text-figures and 48 plates. 30s. , 1935, Antiquity.

[60]  J. P. Royston,et al.  Algorithm AS 181: The W Test for Normality , 1982 .

[61]  E. Sheldon,et al.  Social Indicators , 1975, Science.

[62]  F. Galton Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[63]  J. Nelder A Reformulation of Linear Models , 1977 .

[64]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .