Optimal designs to select individuals for genotyping conditional on observed binary or survival outcomes and non-genetic covariates

In gene-disease association studies, the cost of genotyping makes it economical to use a two-stage design where only a subset of the cohort is genotyped. At the first-stage, the follow-up data along with some risk factors or non-genetic covariates are collected for the cohort and a subset of the cohort is then selected for genotyping at the second-stage. Intuitively the selection of the subset for the second-stage could be carried out efficiently if the data collected at the first-stage are utilized. The information contained in the conditional probability of the genotype given the first-stage data and the initial estimates of the parameters of interest is being maximized for efficient selection of the subset. The proposed selection method is illustrated using the logistic regression and Cox's proportional hazards model and algorithms that can find optimal or nearly optimal designs in discrete design space are presented. Simulation comparisons between D-optimal design, extreme selection and case-cohort design suggest that D-optimal design is the most efficient in terms of variance of estimated parameters, but extreme selection may be a good alternative for practical study design.

[1]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[2]  K. Koehler,et al.  Comparison of methods for analysis of selective genotyping survival data , 2006, Genetics Selection Evolution.

[3]  Bryan Langholz,et al.  Use of Cohort Information in the Design and Analysis of Case‐Control Studies , 2007 .

[4]  G. Elfving Optimum Allocation in Linear Regression Theory , 1952 .

[5]  R. Sitter Robust designs for binary data , 1992 .

[6]  David B. Allison,et al.  Extreme Selection Strategies in Gene Mapping Studies of Oligogenic Quantitative Traits Do Not Always Increase Power , 1998, Human Heredity.

[7]  J V Neel,et al.  The blood groups and secretor types in five potentially fatal diseases of Caucasian children. , 1965, Acta genetica et statistica medica.

[8]  Torben Martinussen,et al.  Maximum Likelihood Estimation for Cox's Regression Model Under Case–Cohort Sampling , 2004 .

[9]  M Reilly,et al.  Optimal sampling strategies for two-stage studies. , 1996, American journal of epidemiology.

[10]  R. H. Myers,et al.  Two-stage designs for the logistic regression model in single-agent bioassays. , 1996, Journal of biopharmaceutical statistics.

[11]  Nick Craddock,et al.  Use of phenotypic covariates in association analysis by sequential addition of cases , 2006, European Journal of Human Genetics.

[12]  R. D. Cook,et al.  A Comparison of Algorithms for Constructing Exact D-Optimal Designs , 1980 .

[13]  Søren Feodor Nielsen,et al.  Inference and Missing Data: Asymptotic Results , 1997 .

[14]  Juha Karvanen,et al.  Estimation of quantile mixtures via L-moments and trimmed L-moments , 2006, Comput. Stat. Data Anal..

[15]  Lawrence Joseph,et al.  A Bayesian A-optimal and model robust design criterion. , 2003, Biometrics.

[16]  Anthony C. Atkinson Optimum Experimental Design , 2011, International Encyclopedia of Statistical Science.

[17]  W. Näther Optimum experimental designs , 1994 .

[18]  Stephen E Wright,et al.  Optimal Experimental Design for a Nonlinear Response in Environmental Toxicology , 2006, Biometrics.

[19]  I. Ford,et al.  The Use of a Canonical Form in the Construction of Locally Optimal Designs for Non‐Linear Problems , 1992 .

[20]  Dibyen Majumdar,et al.  D-optimal designs for logistic models with three and four parameters , 2008 .

[21]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[22]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[23]  G. Carey,et al.  Linkage analysis of quantitative traits: increased power by using selected samples. , 1991, American journal of human genetics.

[24]  Anders Skrondal,et al.  Stratified Case‐Cohort Analysis of General Cohort Sampling Designs , 2007 .

[25]  R. L. Prentice,et al.  A case-cohort design for epidemiologic cohort studies and disease prevention trials , 1986 .

[26]  Sara A. Knott,et al.  Mapping Quantitative Trait Loci Using Linkage Disequilibrium: Marker- versus Trait-based Methods , 2005, Behavior genetics.

[27]  Roseanne McNamee,et al.  Optimal designs of two‐stage studies for estimation of sensitivity, specificity and positive predictive value , 2002, Statistics in medicine.

[28]  D. Clayton,et al.  Statistical Models in Epidemiology , 1993 .

[29]  Anthony C. Atkinson,et al.  Optimum Experimental Designs , 1992 .

[30]  Miguel A. Lejeune,et al.  Heuristic optimization of experimental designs , 2003, Eur. J. Oper. Res..

[31]  Jeanine J. Houwing-Duistermaat,et al.  Power of Selective Genotyping in Genetic Association Analyses of Quantitative Traits , 2000, Behavior genetics.

[32]  F. Pukelsheim Optimal Design of Experiments , 1993 .

[33]  G. Montepiedra Application of genetic algorithms to the construction of exact D-optimal designs , 1998 .

[34]  Sangita Kulathinal,et al.  Bayesian Inference from Case–cohort Data with Multiple End‐points , 2006 .

[35]  Bryan Langholz,et al.  Counter-matching: A stratified nested case-control sampling method , 1995 .

[36]  M. Soller,et al.  Selective genotyping for determination of linkage between a marker locus and a quantitative trait locus , 1992, Theoretical and Applied Genetics.

[37]  W K Wong,et al.  Minimax D‐Optimal Designs for the Logistic Model , 2000, Biometrics.

[38]  Otto Dykstra,et al.  The Augmentation of Experimental Data to Maximize |X′X|@@@The Augmentation of Experimental Data to Maximize |X prime X| , 1971 .

[39]  O. Dykstra The Augmentation of Experimental Data to Maximize [X′X] , 1971 .