Modern Insights About Pearson’s Correlation and Least Squares Regression

As is well known, Pearson’s correlation, ρ, can be used to characterize how well a least squares regression line fits data, and it provides a test of the hypothesis that two measures are independent. However, many articles in statistical journals indicate that the usual estimate of ρ, r, is sensitive to at least six features of data, and that least squares regression and ρ are not robust in the sense reviewed in this article. In practical terms, r can be a highly unsatisfactory measure of the strength of an association, no matter how large the sample size might be. One specific problem is that it can miss strong associations that are detected by more modern techniques. The practical problems with r reflect fundamental concerns about a strict reliance on least squares regression. A few of the many modern methods for dealing with these concerns are briefly indicated.

[1]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[2]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[3]  S. Sheather,et al.  Robust Estimation and Testing , 1990 .

[4]  Confidence intervals for the slope of a regression line when the error term has nonconstant variance , 1996 .

[5]  G. T. Duncan,et al.  A Monte-Carlo study of asymptotically robust tests for correlation coefficients , 1973 .

[6]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[7]  Regina Y. Liu,et al.  Regression depth. Commentaries. Rejoinder , 1999 .

[8]  R. Wilcox Introduction to Robust Estimation and Hypothesis Testing , 1997 .

[9]  R. Wilcox Simulation results on extensions of the theil-sen regression estimator , 1998 .

[10]  R. Wilcox A Note on the Theil-Sen Regression Estimator When the Regressor Is Random and the Error Term Is Heteroscedastic , 1998 .

[11]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[12]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[13]  Rand R. Wilcox,et al.  Fundamentals of Modern Statistical Methods , 2001 .

[14]  Wei-Yin Loh Does the Correlation Coefficient Really Measure the Degree of Clustering Around a Line? , 1987 .

[15]  Rand R. Wilcox,et al.  The goals and strategies of robust methods , 1998 .

[16]  Changbao Wu,et al.  Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[17]  B. Iglewicz,et al.  Bivariate extensions of the boxplot , 1992 .

[18]  Douglas M. Hawkins,et al.  Applications and algorithms for least trimmed sum of absolute deviations regression , 1999 .

[19]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[20]  Walter Sievers Standard and bootstrap confidence intervals for the correlation coefficient , 1996 .

[21]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[22]  J. Barrett The Coefficient of Determination—Some Limitations , 1974 .