Serious mistakes have been made in the past by underestimating the effects of environment and overestimating the effects of genes.1–3 A first seminal 1972 paper by Lewontin1 drew the attention of researchers on the mistakes of partitioning nature and nurture. More recently, Kittles and Weiss, considering the definition of ‘race’, showed the lack of an obvious correspondence between genotypes and phenotypes.3 Many investigations on gene–environment interactions (GEI) are under way in different parts of the world, a subject that also appears as one of the leading items in grant calls from the National Institutes of Health (NIH) or the European Union (EU). Some on-going studies are extremely large (e.g. European Prospective Study into Cancer and Nutrition [EPIC], UK Biobank). All of them employ similar methods for genotyping, while exposure assessment is extremely variable, being for example state-of-the-art for dietary intake in EPIC, but not in other studies or for other exposures. GEI imply studying both environmental exposures (e.g. to pesticides or environmental tobacco smoke) and genetic variants that are supposed to modulate the effects of the former. However, there is an asymmetry between the two. Genotyping is in fact much more accurate than the vast majority of methods used to measure environmental exposures. This implies a lower degree of classification error, that in turn means an easier identification of associations with disease. A further difficulty is related to the rarity of many environmental exposures (that, however, may have an important impact on human health), while several of the polymorphic alleles that are investigated are extremely common (e.g. 40–50% for NAT2 or GSTM1). This, again, increases the probability of detecting an association with genotypes (if this is real), but not with environmental exposures. Let us consider the example in Table 1. The Table refers to the implications of measurement error for the estimation of relative risks. Classification error is expressed by the correlation coefficient between each ‘assessor’ and a reference standard (r = 1 means no error, r = 0.9 means a 10% classification error). For three different expected relative risks that associate exposure with disease (1.5, 2.0, and 2.5), the Table shows the observed relative risks under different conditions of classification error. For example, a classification error of 10% implies the drop of a relative risk of 2.5 to 2.3, i.e. little change. With a classification error of 90% (assessor 1), however, even a relative risk of 2.5 becomes 1.1, i.e. undetectable with common epidemiological methods. Unfortunately, while in genotyping we are more frequently in the situation of assessor 4, implying a small underestimation of risks, in the field of environmental exposures we are more frequently in the situation of 3 or even 2. Things become even more complex if we want to study interaction, for example between a frequent exposure (prevalence 25%) and a frequent genotype (prevalence 50%). Let us suppose that classification error is 20% for the environmental exposure (sensitivity = 80%), a value very likely to be smaller than in reality for most exposures. Classification error could to be around 7% for genotyping (sensitivity 93%). This is realistic, since genotyping techniques are currently validated and extremely accurate. The consequence of this situation is that we would need approximately 1800 cases to observe main effects (but no statistical interaction between exposure and genes), if no classification error occurs; 2700 if exposure is incorrectly classified 20% of the times; and 3200 if also the genotype is mistaken 7% of the times. We consider sensitivity in the example; with specificity lower than 100% numbers increase further. They also increase further under the assumption of a statistical interaction (i.e. departure from a multiplicative model) between the gene and the environment. The issue of how sample size changes for gene–environment interactions, depending on the model of statistical interaction we choose, is discussed by Clayton and McKeigue.4 According to estimates, the common genotyping method Taqman has 96% sensitivity and 98% specificity, thus allowing little error in classification. On the contrary, sensitivity in environmental exposure assessment is quite often lower than 70% and specificity even lower. This situation is not due to a
[1]
K. Weiss,et al.
Race, ancestry, and genes: implications for defining disease risk.
,
2003,
Annual review of genomics and human genetics.
[2]
J. Manson,et al.
Laboratory reproducibility of endogenous hormone levels in postmenopausal women.
,
1994,
Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.
[3]
Ties Boerma,et al.
Getting the numbers right.
,
2005,
Bulletin of the World Health Organization.
[4]
S. Ebrahim,et al.
'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease?
,
2003,
International journal of epidemiology.
[5]
David Clayton,et al.
Epidemiological methods for studying genes and environmental factors in complex diseases
,
2001,
The Lancet.
[6]
R. Lewontin.
The Apportionment of Human Diversity
,
1972
.
[7]
P. McKeigue,et al.
Problems of reporting genetic associations with complex outcomes
,
2003,
The Lancet.
[8]
J. Kahn.
Getting the Numbers Right: Statistical Mischief and Racial Profiling in Heart Failure Research
,
2003,
Perspectives in biology and medicine.