Correlation and regression

In many health-related studies, investigators wish to assess the strength of an association between 2 measured (continuous) variables. For example, the relation between high-sensitivity C-reactive protein (hs-CRP) and body mass index (BMI) may be of interest. Although BMI is often treated as a categorical variable, eg, underweight, normal, overweight, and obese, a noncategorized version is more detailed and thus may be more informative in terms of detecting associations. Correlation and regression are 2 relevant (and related) widely used approaches for determining the strength of an association between 2 variables. Correlation provides a unitless measure of association (usually linear), whereas regression provides a means of predicting one variable (dependent variable) from the other (predictor variable). This report summarizes correlation coefficients and least-squares regression, including intercept and slope coefficients. Correlation provides a “unitless” measure of association between 2 variables, ranging from −1 (indicating perfect negative association) to 0 (no association) to +1 (perfect positive association). Both variables are treated equally in that neither is considered to be a predictor or an outcome. The most commonly used version is the Pearson product-moment coefficient of correlation, r . Suppose one wants to estimate the correlation between X=BMI, denoted for the ith subject as Xi, and Y=hs-CRP, denoted for the ith subject as Yi. This is estimated for a sample of size n (i=1,…, n) using the following formula1: equation ![Formula][1] where equation ![Formula][2] and equation ![Formula][3] Here, ![Graphic][4] indicates the sample mean of X (=BMI), and ![Graphic][5] the sample mean of Y (=hs-CRP). The numerator of r reflects how BMI and hs-CRP co-vary, and the denominator reflects the variability of both BMI and hs-CRP about their respective sample means. The Pearson correlation coefficient assumes that X and Y are jointly distributed as bivariate normal, ie, X and Y each are normally … [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/graphic-3.gif [4]: /embed/inline-graphic-1.gif [5]: /embed/inline-graphic-2.gif