Design aspects of calibration studies in nutrition, with analysis of missing data in linear measurement error models.

Motivated by an example in nutritional epidemiology, we investigate some design and analysis aspects of linear measurement error models with missing surrogate data. The specific problem investigated consists of an initial large sample in which the response (a food frequency questionnaire, FFQ) is observed and then a smaller calibration study in which replicates of the error prone predictor are observed (food records or recalls, FR). The difference between our analysis and most of the measurement error model literature is that, in our study, the selection into the calibration study can depend on the value of the response. Rationale for this type of design is given. Two major problems are investigated. In the design of a calibration study, one has the option of larger sample sizes and fewer replicates or smaller sample sizes and more replicates. Somewhat surprisingly, neither strategy is uniformly preferable in cases of practical interest. The answers depend on the instrument used (recalls or records) and the parameters of interest. The second problem investigated is one of analysis. In the usual linear model with no missing data, method of moments estimates and normal-theory maximum likelihood estimates are approximately equivalent, with the former method in most use because it can be calculated easily and explicitly. Both estimates are valid without any distributional assumptions. In contrast, in the missing data problem under consideration, only the moments estimate is distribution-free, but the maximum likelihood estimate has at least 50% greater precision in practical situations when normality obtains. Implications for the design of nutritional calibration studies are discussed.

[1]  J. Kent Robust properties of likelihood ratio tests , 1982 .

[2]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for random within-person measurement error. , 1992, American journal of epidemiology.

[3]  W. Willett,et al.  Interval estimates for correlation coefficients corrected for within-person variation: implications for study design and hypothesis testing. , 1988, American journal of epidemiology.

[4]  A. Carriquiry,et al.  A Semiparametric Transformation Approach to Estimating Usual Daily Intake Distributions , 1996 .

[5]  L. Lissner,et al.  Dietary underreporting by obese individuals--is it specific or non-specific? , 1995, BMJ.

[6]  M. Kendall Theoretical Statistics , 1956, Nature.

[7]  R. Carroll,et al.  Adjusting for time trends when estimating the relationship between dietary intake obtained from a food frequency questionnaire and true average intake. , 1995, Biometrics.

[8]  Daniel W. Schafer,et al.  Covariate measurement error in generalized linear models , 1987 .

[9]  V. Kipnis,et al.  A new class of measurement‐error models, with applications to dietary data , 1998 .

[10]  S A Bingham,et al.  Limitations of the various methods for collecting dietary intake data. , 1991, Annals of nutrition & metabolism.

[11]  P. J. Huber The behavior of maximum likelihood estimates under nonstandard conditions , 1967 .

[12]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[13]  D Clayton,et al.  Measurement error in dietary assessment: an investigation using covariance structure models. Part I. , 1993, Statistics in medicine.

[14]  D. Clayton,et al.  Measurement error in dietary assessment: an investigation using covariance structure models. Part II. , 1993, Statistics in medicine.

[15]  M. Moskowitz,et al.  Feasibility of a randomized trial of a low-fat diet for the prevention of breast cancer: dietary compliance in the Women's Health Trial Vanguard Study. , 1990, Preventive medicine.

[16]  W. Willett,et al.  Reproducibility and validity of a semiquantitative food frequency questionnaire. , 1985, American journal of epidemiology.

[17]  J. Hébert,et al.  Methodologic considerations for investigating the diet-cancer link. , 1988, The American journal of clinical nutrition.

[18]  R J Carroll,et al.  Use of semiquantitative food frequency questionnaires to estimate the distribution of usual intake. , 1996, American journal of epidemiology.

[19]  D O Stram,et al.  Cost-efficient design of a diet validation study. , 1995, American journal of epidemiology.

[20]  R. Prentice,et al.  Aspects of the rationale for the Women's Health Trial. , 1988, Journal of the National Cancer Institute.

[21]  J. Freudenheim,et al.  The problem of profound mismeasurement and the power of epidemiological studies of diet and cancer. , 1988, Nutrition and cancer.

[22]  B Rosner,et al.  Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. , 2006, Statistics in medicine.

[23]  Raymond J. Carroll,et al.  On errors-in-variables for binary regression models , 1984 .

[24]  L S Freedman,et al.  The impact of dietary measurement error on planning sample size required in a cohort study. , 1990, American journal of epidemiology.

[25]  E. Riboli,et al.  Sample size requirements for calibration studies of dietary intake measurements in prospective cohort investigations. , 1995, American journal of epidemiology.