Analysis of Covariance Models for Data From Observational Field Studies

Abstract We outline the features of a general class of statistical models (i.e., analysis of covariance [ANCOVA] models) that has proven to be effective for the analysis of data from observational studies. In observational studies, treatments are assigned by Nature in a decidedly nonrandom manner; consequently, many of the crucial assumptions and safeguards of the classic experimental design either fail or are absent. Hence, inferences (causal or associative) are more difficult to justify. Typically, investigators can expect the primary factors of interest, which are usually called environmental exposures rather than treatments, to be involved in complex interactions with each other and with other factors, and these factors will be confounded with still other factors. We provide examples illustrating the application of ANCOVA models to adjust for confounding factors and complex interactions, thereby providing relatively clean estimates of association between exposure and response. We summarize information on available software and supporting literature for implementing ANCOVA models for the analysis of cross-sectional and longitudinal observational field data. We conclude with a brief discussion of critical model fitting issues, including proper specification of the functional form of continuous covariates and problems associated with overfitted models and misspecified models that lack important covariates.

[1]  William G. Cochran,et al.  The Use of Covariance in Observational Studies , 1969 .

[2]  Leslie Kish,et al.  Statistical Design for Research: Kish/Statistical Design for Research , 2005 .

[3]  Kurt J. Haroldson,et al.  Association of Ring-Necked Pheasant, Gray Partridge, and Meadowlark Abundance to Conservation Reserve Program Grasslands , 2006 .

[4]  Ramon C. Littell,et al.  Modelling covariance structure in the analysis of repeated measures data. , 2000 .

[5]  R C Littell,et al.  Mixed Models: Modelling Covariance Structure in the Analysis of Repeated Measures Data , 2005 .

[6]  A. Agresti An introduction to categorical data analysis , 1997 .

[7]  D. Parkinson,et al.  Bayesian Methods in Cosmology: Model selection and multi-model inference , 2009 .

[8]  R. O. Gilbert Statistical Methods for Environmental Pollution Monitoring , 1987 .

[9]  N. Jewell,et al.  A geometric approach to assess bias due to omitted covariates in generalized linear models , 1993 .

[10]  Nagaraj K. Neerchal,et al.  Environmental Statistics with S-PLUS , 2000 .

[11]  W W Hauck,et al.  A consequence of omitted covariates when estimating odds ratios. , 1991, Journal of clinical epidemiology.

[12]  Mari Palta,et al.  Testing for omitted variables and non-linearity in regression models for longitudinal data. , 1994, Statistics in medicine.

[13]  M. Hanson,et al.  Potential effects of fish predation on Wetland invertebrates: A comparison of wetlands with and without fathead minnows , 1995, Wetlands.

[14]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[15]  C. Huber-Carol,et al.  Effects of omitting covariates in Cox's model for survival data , 1988 .

[16]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[17]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[18]  S. Hurlbert Pseudoreplication and the Design of Ecological Field Experiments , 1984 .

[19]  A. Olsen,et al.  Spatially Balanced Sampling of Natural Resources , 2004 .

[20]  R. Kronmal,et al.  Assessing the sensitivity of regression results to unmeasured confounders in observational studies. , 1998, Biometrics.

[21]  Regression analysis, residual analysis and missing variables in regression models , 1985 .

[22]  R. Littell SAS System for Mixed Models , 1996 .

[23]  Trevor Hastie,et al.  Statistical Models in S , 1991 .

[24]  J. Ware,et al.  Applied Longitudinal Analysis , 2004 .

[25]  B. Everitt The Cambridge Dictionary of Statistics , 1998 .

[26]  Russell D. Wolfinger,et al.  SAS for Mixed Models, Second Edition , 2006 .

[27]  Julian J. Faraway,et al.  Extending the Linear Model with R , 2004 .

[28]  N. S. Urquhart,et al.  Sample Representativeness: A Must for Reliable Regional Lake Condition Estimates , 1999 .

[29]  David R. Anderson,et al.  Advanced distance sampling , 2004 .

[30]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[31]  Douglas S. Robson,et al.  Techniques for wildlife investigations : design and analysis of capture data , 1993 .