Bias in Estimation and Hypothesis Testing of Correlation

This study examined bias in the sample correlation coefficient, r, and its correction by unbiased estimators. Computer simulations revealed that the expected value of correlation coefficients in samples from a normal population is slightly less than the population correlation, ρ, and that the bias is almost eliminated by an estimator suggested by R.A. Fisher and is more completely eliminated by a related estimator recommended by Olkin and Pratt. Transformation of initial scores to ranks and calculation of the Spearman rank correlation, rS, produces somewhat greater bias. Type I error probabilities of significance tests of zero correlation based on the Student t statistic and exact tests based on critical values of rS obtained from permutations remain fairly close to the significance level for normal and several non-normal distributions. However, significance tests of non-zero values of correlation based on the r to Z transformation are grossly distorted for distributions that violate bivariate normality. Also, significance tests of non-zero values of rS based on the r to Z transformation are distorted even for normal distributions. This paper examines some unfamiliar properties of the Pearson product-moment correlation that have implications for research in psychology, education, and various social sciences. Some characteristics of the sampling distribution of the correlation coefficient, originally discovered by R.A. Fisher (1915), were largely ignored throughout most of the 20th century, even though correlation is routinely employed in many kinds of research in these disciplines. It is known that the sample correlation coefficient is a biased estimator of the population correlation, but in practice researchers rarely recognize the bias and attempt to correct for it.

[1]  T. A. Bray,et al.  A Convenient Method for Generating Normal Variables , 1964 .

[2]  R. Fowler,et al.  Power and Robustness in Product-Moment Correlation , 1987 .

[3]  Ingram Olkin,et al.  Unbiased Estimation of Certain Correlation Coefficients , 1958 .

[4]  H. Daniels Note on Durbin and Stuart's Formula for E(RS) , 1951 .

[5]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[6]  E. S. Pearson,et al.  TESTS FOR RANK CORRELATION COEFFICIENTS. I , 1957 .

[7]  James Durbin,et al.  Inversions and Rank Correlation Coefficients , 1951 .

[8]  E. S. Keeping,et al.  Mathematics of Statistics, Part Two. Second Edition. , 1952 .

[9]  R. Charter,et al.  Fisher's Z to R , 1983 .

[10]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[11]  M. E. Muller,et al.  A Note on the Generation of Random Normal Deviates , 1958 .

[12]  G. J. Glasser,et al.  Critical values of the coefficient of rank correlation for testing the hypothesis of independence , 1961 .

[13]  F. N. David,et al.  The variance of Spearman's rho in normal samples , 1961 .

[14]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[15]  Alan Stuart,et al.  THE CORRELATION BETWEEN VARIATE-VALUES AND RANKS IN SAMPLES FROM A CONTINUOUS DISTRIBUTION , 1954 .

[16]  B. Morgan Elements of Simulation , 1984 .

[17]  D. W. Zimmerman,et al.  Properties of the Spearman Correction for Attenuation for Normal and Realistic Non-Normal Distributions , 1997 .

[18]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[19]  Maurice G. Kendall,et al.  The Distribution of Spearman's Coefficient of Rank Correlation in a Universe in which all Rankings Occur an Equal Number of Times: , 1939 .

[20]  M. Kendall,et al.  Rank Correlation Methods (5th ed.). , 1992 .

[21]  H. E. Daniels,et al.  Rank Correlation and Population Models , 1950 .

[22]  W. Hoeffding A Non-Parametric Test of Independence , 1948 .

[23]  George Marsaglia,et al.  Toward a universal random number generator , 1987 .

[24]  Paul M. Muchinsky,et al.  The Correction for Attenuation , 1996 .