New aspects of statistical methods for missing data problems, with applications in bioinformatics and genetics

We propose a nonparametric imputation procedure for data with missing val­ ues and establish empirical likelihood based inference for parameters defined by general estimating equations. The imputation is carried out multiple times via a nonparamet­ ric estimator of the conditional distribution of the missing component given the always observable component of the random vector under study. The empirical likelihood is used to construct a profile likelihood for the parameter of interest. We demonstrate that the proposed nonparametric imputation can correct the selection bias in the missingness and empirical likelihood leads to more efficient parameter estimation. The proposed method is evaluated by simulation and an empirical study on the relationship between eye weight and gene transcriptional abundance of recombinant inbred mice.

[1]  Robert W. Williams,et al.  WebQTL - Web-based complex trait analysis , 2003, Neuroinformatics.

[2]  Andrew I Su,et al.  Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics' , 2005, Nature Genetics.

[3]  M S Pepe,et al.  Surrogate and auxiliary endpoints in clinical trials, with potential applications in cancer and AIDS research. , 1994, Statistics in medicine.

[4]  T. Gerig,et al.  Interval mapping in the analysis of nonadditive quantitative trait loci , 1992 .

[5]  J. Shao,et al.  Bootstrap for Imputed Survey Data , 1996 .

[6]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[7]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[8]  Han Hong,et al.  Semiparametric Efficiency in GMM Models of Nonclassical Measurement Errors, Missing Data and Treatment Effects , 2008 .

[9]  Korbinian Strimmer,et al.  Modeling gene expression measurement error: a quasi-likelihood approach , 2003, BMC Bioinformatics.

[10]  Z B Zeng,et al.  Genetic architecture of a morphological shape difference between two Drosophila species. , 2000, Genetics.

[11]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[12]  H. Hotelling New Light on the Correlation Coefficient and its Transforms , 1953 .

[13]  Bing-Yi Jing,et al.  Two-sample empirical likelihood method , 1995 .

[14]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[15]  Nathaniel Schenker,et al.  Asymptotic results for multiple imputation , 1988 .

[16]  Jason P. Fine,et al.  A note on a partial empirical likelihood , 2002 .

[17]  Robert W. Williams,et al.  Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function , 2005, Nature Genetics.

[18]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[19]  Jing Qin,et al.  Information Recovery in a Study With Surrogate Endpoints , 2003 .

[20]  J. Henshall,et al.  Multiple-trait mapping of quantitative trait loci after selective genotyping using logistic regression. , 1999, Genetics.

[21]  Margaret S. Pepe,et al.  Inference using surrogate outcome data and a validation sample , 1992 .

[22]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[23]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[24]  John D. Storey A direct approach to false discovery rates , 2002 .

[25]  D. Rubin,et al.  Multiple Imputation for Nonresponse in Surveys , 1989 .

[26]  Jan-Peter Nap,et al.  Regulating gene expression: surprises still in store. , 2004, Trends in genetics : TIG.

[27]  D. Nettleton,et al.  Identifying Genes Associated with a Quantitative Trait or Quantitative Trait Locus via Selective Transcriptional Profiling , 2006, Biometrics.

[28]  F. Yates The analysis of replicated experiments when the field results are incomplete , 1933 .

[29]  Rachel B. Brem,et al.  Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors , 2003, Nature Genetics.

[30]  Robert W. Williams,et al.  A new set of BXD recombinant inbred lines from advanced intercross populations in mice , 2004, BMC Genetics.

[31]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[32]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[33]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[34]  J. Robins,et al.  Semiparametric Efficiency in Multivariate Regression Models with Missing Data , 1995 .

[35]  R. Jansen,et al.  Interval mapping of multiple quantitative trait loci. , 1993, Genetics.

[36]  J. F. Lawless,et al.  Estimating equations, empirical likelihood and constraints on parameters† , 1995 .

[37]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[38]  L. Hansen Large Sample Properties of Generalized Method of Moments Estimators , 1982 .

[39]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[40]  Calculating Comparable Statistics From Incomparable Surveys, With an Application to Poverty in India , 2007 .

[41]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[42]  Jorge L. Mendoza,et al.  Fisher transformations for correlations corrected for selection and missing data , 1993 .

[43]  Z. Zeng Precision mapping of quantitative trait loci. , 1994, Genetics.

[44]  A. Owen Empirical Likelihood Ratio Confidence Regions , 1990 .

[45]  M. S. Bartlett,et al.  Some examples of statistical methods of research in agriculture and applied biology , 1937 .

[46]  Peter Hall,et al.  Methodology and algorithms of empirical likelihood , 1990 .

[47]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[48]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[49]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[50]  Lu Lu,et al.  The genetic structure of recombinant inbred mice: high-resolution consensus maps for complex trait analysis , 2001, Genome Biology.

[51]  Z B Zeng,et al.  Estimating the genetic architecture of quantitative traits. , 1999, Genetical research.

[52]  Deepayan Sarkar,et al.  Detecting differential gene expression with a semiparametric hierarchical mixture method. , 2004, Biostatistics.

[53]  Z. Zeng,et al.  Multiple interval mapping for quantitative trait loci. , 1999, Genetics.

[54]  L. Kruglyak,et al.  Genetic Dissection of Transcriptional Regulation in Budding Yeast , 2002, Science.

[55]  G. Imbens,et al.  Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score , 2000 .

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  K. Broman,et al.  Review of statistical methods for QTL mapping in experimental crosses. , 2001, Lab animal.

[58]  J. Lawless,et al.  Empirical Likelihood and General Estimating Equations , 1994 .

[59]  A. Owen Empirical likelihood ratio confidence intervals for a single functional , 1988 .

[60]  Art B. Owen,et al.  Empirical Likelihood for Linear Models , 1991 .

[61]  Wayne A. Fuller,et al.  Fractional hot deck imputation , 2004 .

[62]  J. Aitchison,et al.  Maximum-Likelihood Estimation of Parameters Subject to Restraints , 1958 .

[63]  D. Boos On Generalized Score Tests , 1992 .

[64]  R. Stoughton,et al.  Genetics of gene expression surveyed in maize, mouse and man , 2003, Nature.

[65]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[66]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[67]  R. Doerge Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations , 2002, Nature Reviews Genetics.

[68]  Hengjian Cui,et al.  On Bartlett correction of empirical likelihood in the presence of nuisance parameters , 2006 .

[69]  C. Heyde,et al.  Quasi-likelihood and its application , 1997 .

[70]  Thomas J. DiCiccio,et al.  Empirical Likelihood is Bartlett-Correctable , 1991 .

[71]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..