On the Appropriateness of the Correlation Coefficient with a 0, 1 Dependent Variable

Abstract This article deals with the use and misuse of the correlation coefficient when the dependent variable is of a dichotomous 0,1 nature. It focuses particularly on problems relating to curvilinearity and the nature of the prediction being made. The prediction of consumer purchases from reported subjective probabilities provides a vehicle for illustrating problems discussed. It is noted that with a 0,1 dependent variable, the correlation ratio is likely to be a better measure of the degree of relationship than the coefficient of determination because it is free of restrictions on the functional form of the relationship. The article then considers the mean error probability and the average conditional entropy as alternative measures. Finally, the article emphasizes that the purpose for using the relation between the independent and dependent variable should govern the development of an appropriate model and the measure to be used for deciding which of several independent variables is best.

[1]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[2]  J. Gart,et al.  On the bias of various estimators of the logit and its variance with application to quantal bioassay. , 1967, Biometrika.

[3]  S. S. Stevens,et al.  A metric for the social consensus. , 1966, Science.

[4]  Strother H. Walker,et al.  Estimation of the probability of an event as a function of several independent variables. , 1967, Biometrika.

[5]  R. Tate Applications of Correlation Models for Biserial Data , 1955 .

[6]  Henri Theil,et al.  How Informative are Consumer Buying Intentions Surveys , 1968 .

[7]  Arnold Zellner,et al.  JOINT ESTIMATION OF RELATIONSHIPS INVOLVING DISCRETE RANDOM VARIABLES , 1965 .

[8]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[9]  J. Berkson Application of the Logistic Function to Bio-Assay , 1944 .

[10]  D. Cox,et al.  A General Definition of Residuals , 1968 .

[11]  F. Thomas Juster,et al.  Consumer Buying Intentions and Purchase Probability: An Experiment in Survey Design , 1966 .

[12]  S. Gupta,et al.  Point biserial correlation coefficient and its generalization , 1960 .

[13]  Henri Theil,et al.  Economics and information theory , 1967 .

[14]  S. Warner Multivariate Regression of Dummy Variates under Normality Assumptions , 1963 .

[15]  Joseph Lev,et al.  The Point Biserial Coefficient of Correlation , 1949 .

[16]  Robert F. Tate,et al.  Correlation Between a Discrete and a Continuous Variable. Point-Biserial Correlation , 1954 .

[17]  R. Tate,et al.  Conditional-Normal Regression Models , 1966 .