Correction of phenotype misclassification based on high-discrimination genetic predictive risk models.

Misclassification of phenotype status can seriously affect accuracy in association studies, including studies of genetic risk factors. A common problem is the classification of participants as nondiseased because of insufficient diagnostic workup or because participants have not been followed up long enough to develop disease. Some validated predictive models may have high discrimination in predicting disease. We suggest that information from such models can be used to predict the risk that a nondiseased participant will eventually develop disease and to recode the status of participants predicted to be at highest risk. We evaluate conditions under which recoding results in a maximal net improvement in the accuracy of phenotype classification. Net improvement is expected only when the positive likelihood ratio of the predictive model is larger than the inverse of the odds of disease among apparently nondiseased controls. We conducted simulations to probe the impact of reclassification on the power to detect new risk factors under several scenarios of classification accuracy of the previously developed models. We also apply this framework to a validated model of progression to advanced age-related macular degeneration that uses genetic and nongenetic variables (area under the curve = 0.915). In the training cohort (n = 2,937) and a separate validation cohort (n = 1,227), 195-272 and 78-91 nonprogressor participants, respectively, were reclassified as progressors. Correction of phenotype misclassification based on highly informative predictive models may be helpful in identifying additional genetic and other risk factors, when there are validated risk factors that provide strong discriminating ability.

[1]  Johanna M Seddon,et al.  Risk models for progression to advanced age-related macular degeneration using demographic, environmental, genetic, and ocular factors. , 2011, Ophthalmology.

[2]  John P A Ioannidis,et al.  Predicting death: an empirical evaluation of predictive tools for mortality. , 2011, Archives of internal medicine.

[3]  Aaron Y. Lee,et al.  Common variants near FRK/COL10A1 and VEGFA are associated with advanced age-related macular degeneration , 2011, Human molecular genetics.

[4]  D. Goldstein,et al.  Impact of phenotype definition on genome-wide association signals: empirical evaluation in human immunodeficiency virus type 1 infection. , 2011, American journal of epidemiology.

[5]  J. Ioannidis,et al.  Strengthening the reporting of genetic risk prediction studies: the GRIPS statement , 2011, Genetics in Medicine.

[6]  Bjarni V. Halldórsson,et al.  Meta-analysis of genome-wide association studies confirms a susceptibility locus for knee osteoarthritis on chromosome 7q22 , 2010, Annals of the rheumatic diseases.

[7]  Robert H Lyles,et al.  Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting , 2010, Statistics in medicine.

[8]  Ming D. Li,et al.  Genome-wide meta-analyses identify multiple loci associated with smoking behavior , 2010, Nature Genetics.

[9]  John P A Ioannidis,et al.  What makes a good predictor?: the evidence applied to coronary artery calcium score. , 2010, JAMA.

[10]  Aaron Y. Lee,et al.  Genome-wide association study of advanced age-related macular degeneration identifies a role of the hepatic lipase gene (LIPC) , 2010, Proceedings of the National Academy of Sciences.

[11]  Margaret A. Pericak-Vance,et al.  Genetic variants near TIMP3 and high-density lipoprotein–associated loci influence susceptibility to age-related macular degeneration , 2010, Proceedings of the National Academy of Sciences.

[12]  D. Bild,et al.  Score What Makes a Good Predictor ? : The Evidence Applied to Coronary Artery Calcium , 2010 .

[13]  J. Ioannidis,et al.  Assessment of claims of improved prediction beyond the Framingham risk score. , 2009, JAMA.

[14]  D. S. Parker,et al.  Challenges in phenotype definition in the whole-genome era: multivariate models of memory and intelligence , 2009, Neuroscience.

[15]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[16]  John P. A. Ioannidis,et al.  Validating, augmenting and refining genome-wide association signals , 2009, Nature Reviews Genetics.

[17]  D G Altman,et al.  Prognostic markers in cancer: the evolution of evidence from single studies to meta-analysis, and beyond , 2009, British Journal of Cancer.

[18]  E. Souied,et al.  [Epidemiology of age related macular degeneration]. , 2009, Journal Francais d'Ophtalmologie.

[19]  A. Paterson,et al.  Phenotype definition and development—contributions from Group 7 , 2009, Genetic epidemiology.

[20]  T. Hansen,et al.  Genotype-Phenotype Associations in Obesity Dependent on Definition of the Obesity Phenotype , 2008, Obesity Facts.

[21]  M. Wojczynski,et al.  Definition of phenotype. , 2008, Advances in genetics.

[22]  Johanna M Seddon,et al.  Variation in complement factor 3 is associated with risk of age-related macular degeneration , 2007, Nature Genetics.

[23]  Laurence Freedman Quantitative science methods for biomarker validation in chemoprevention trials. , 2007, Cancer biomarkers : section A of Disease markers.

[24]  Chi Pui Pang,et al.  HTRA1 promoter polymorphism in wet age-related macular degeneration. , 2007, Science.

[25]  P. Chyou Patterns of bias due to differential misclassification by case–control status in a case–control study , 2007, European Journal of Epidemiology.

[26]  Johanna M Seddon,et al.  Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration , 2006, Nature Genetics.

[27]  Timothy L Lash,et al.  A method to automate probabilistic sensitivity analyses of misclassified binary variables. , 2005, International journal of epidemiology.

[28]  J. Ott,et al.  Complement Factor H Polymorphism in Age-Related Macular Degeneration , 2005, Science.

[29]  Gang Zheng,et al.  The impact of diagnostic error on testing genetic association in case–control studies , 2005, Statistics in medicine.

[30]  Sander Greenland,et al.  Multiple‐bias modelling for analysis of observational data , 2005 .

[31]  S. Faraone,et al.  Identification of the phenotype in psychiatric genetics , 2005, European Archives of Psychiatry and Clinical Neuroscience.

[32]  Chad Haynes,et al.  Power and sample size calculations in the presence of phenotype errors for case/control genetic association studies , 2005, BMC Genetics.

[33]  G. Molenberghs,et al.  A perspective on surrogate endpoints in controlled clinical trials , 2004, Statistical methods in medical research.

[34]  R J Carroll,et al.  On meta-analytic assessment of surrogate outcomes. , 2000, Biostatistics.

[35]  D G Altman,et al.  What do we mean by validating a prognostic model? , 2000, Statistics in medicine.

[36]  S D Walter,et al.  Estimation of test error rates, disease prevalence and relative risk from misclassified data: a review. , 1988, Journal of clinical epidemiology.

[37]  P A Lachenbruch,et al.  Effects of misclassifications on statistical inferences in epidemiology. , 1980, American journal of epidemiology.

[38]  H Checkoway,et al.  Bias due to misclassification in the estimation of relative risk. , 1977, American journal of epidemiology.