论文信息 - Correlation and regression

Correlation and regression

In many health-related studies, investigators wish to assess the strength of an association between 2 measured (continuous) variables. For example, the relation between high-sensitivity C-reactive protein (hs-CRP) and body mass index (BMI) may be of interest. Although BMI is often treated as a categorical variable, eg, underweight, normal, overweight, and obese, a noncategorized version is more detailed and thus may be more informative in terms of detecting associations. Correlation and regression are 2 relevant (and related) widely used approaches for determining the strength of an association between 2 variables. Correlation provides a unitless measure of association (usually linear), whereas regression provides a means of predicting one variable (dependent variable) from the other (predictor variable). This report summarizes correlation coefficients and least-squares regression, including intercept and slope coefficients. Correlation provides a “unitless” measure of association between 2 variables, ranging from −1 (indicating perfect negative association) to 0 (no association) to +1 (perfect positive association). Both variables are treated equally in that neither is considered to be a predictor or an outcome. The most commonly used version is the Pearson product-moment coefficient of correlation, r . Suppose one wants to estimate the correlation between X=BMI, denoted for the ith subject as Xi, and Y=hs-CRP, denoted for the ith subject as Yi. This is estimated for a sample of size n (i=1,…, n) using the following formula1: equation ![Formula][1] where equation ![Formula][2] and equation ![Formula][3] Here, ![Graphic][4] indicates the sample mean of X (=BMI), and ![Graphic][5] the sample mean of Y (=hs-CRP). The numerator of r reflects how BMI and hs-CRP co-vary, and the denominator reflects the variability of both BMI and hs-CRP about their respective sample means. The Pearson correlation coefficient assumes that X and Y are jointly distributed as bivariate normal, ie, X and Y each are normally … [1]: /embed/graphic-1.gif [2]: /embed/graphic-2.gif [3]: /embed/graphic-3.gif [4]: /embed/inline-graphic-1.gif [5]: /embed/inline-graphic-2.gif

S. Crawford | Donal O'Brien | Sybil L Crawford | Pamela Sharkey Scott

[1] Richard F. Gunst,et al. Applied Regression Analysis , 1999, Technometrics.

[2] Peter J. Rousseeuw,et al. Robust regression and outlier detection , 1987 .

[3] D. Ragland,et al. Dichotomizing Continuous Outcome Variables: Dependence of the Magnitude of Association and Statistical Power on the Cutpoint , 1992, Epidemiology.

[4] W. W. Muir,et al. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[5] Malik Beshir Malik,et al. Applied Linear Regression , 2005, Technometrics.

[6] Norman R. Draper,et al. Applied regression analysis (2. ed.) , 1981, Wiley series in probability and mathematical statistics.

[7] M Mazumdar,et al. Categorizing a prognostic variable: review of methods, code for easy implementation and applications to decision-making about cancer treatments. , 2000, Statistics in medicine.

[8] N. Jaspen. Applied Nonparametric Statistics , 1979 .