Efficient estimation of grouped survival models

BackgroundTime- and dose-to-event phenotypes used in basic science and translational studies are commonly measured imprecisely or incompletely due to limitations of the experimental design or data collection schema. For example, drug-induced toxicities are not reported by the actual time or dose triggering the event, but rather are inferred from the cycle or dose to which the event is attributed. This exemplifies a prevalent type of imprecise measurement called grouped failure time, where times or doses are restricted to discrete increments. Failure to appropriately account for the grouped nature of the data, when present, may lead to biased analyses.ResultsWe present groupedSurv, an R package which implements a statistically rigorous and computationally efficient approach for conducting genome-wide analyses based on grouped failure time phenotypes. Our approach accommodates adjustments for baseline covariates, and analysis at the variant or gene level. We illustrate the statistical properties of the approach and computational performance of the package by simulation. We present the results of a reanalysis of a published genome-wide study to identify common germline variants associated with the risk of taxane-induced peripheral neuropathy in breast cancer patients.ConclusionsgroupedSurv enables fast and rigorous genome-wide analysis on the basis of grouped failure time phenotypes at the variant, gene or pathway level. The package is freely available under a public license through the Comprehensive R Archive Network.

[1]  C. Bonferroni Il calcolo delle assicurazioni su gruppi di teste , 1935 .

[2]  Gonçalo R. Abecasis,et al.  The variant call format and VCFtools , 2011, Bioinform..

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  P. Grambsch,et al.  A Package for Survival Analysis in S , 1994 .

[5]  Larry Norton,et al.  Comparison of doxorubicin and cyclophosphamide versus single-agent paclitaxel as adjuvant therapy for breast cancer in women with 0 to 3 positive axillary nodes: CALGB 40101 (Alliance). , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[6]  G. Abecasis,et al.  MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes , 2010, Genetic epidemiology.

[7]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[8]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data , 1980 .

[9]  D. Freedman How Can the Score Test Be Inconsistent? , 2007 .

[10]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[11]  P. Zak,et al.  Oxytocin Increases Generosity in Humans , 2007, PloS one.

[12]  Pamela A Shaw,et al.  Exact and Asymptotic Weighted Logrank Tests for Interval Censored Data: The interval R package. , 2010, Journal of statistical software.

[13]  P K Thomas,et al.  N-myc downstream-regulated gene 1 is mutated in hereditary motor and sensory neuropathy-Lom. , 2000, American journal of human genetics.

[14]  Marit Holden,et al.  GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies , 2008, Bioinform..

[15]  N. Breslow Covariance analysis of censored survival data. , 1974, Biometrics.

[16]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[17]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[18]  Paul Shannon,et al.  VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants , 2014, Bioinform..

[19]  Roberto Pili,et al.  The N-Myc Down Regulated Gene1 (NDRG1) Is a Rab4a Effector Involved in Vesicular Recycling of E-Cadherin , 2007, PloS one.

[20]  Edward Blair,et al.  Compound heterozygous deletion of NRXN1 causing severe developmental delay with early onset epilepsy in two sisters , 2011, American journal of medical genetics. Part A.

[21]  C. Becker,et al.  RNA-Seq Analysis of Human Trigeminal and Dorsal Root Ganglia with a Focus on Chemoreceptors , 2015, PloS one.

[22]  J. Palmgren,et al.  Estimation of Multivariate Frailty Models Using Penalized Partial Likelihood , 2000, Biometrics.

[23]  Michael Boehnke,et al.  LocusZoom: regional visualization of genome-wide association scan results , 2010, Bioinform..

[24]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[25]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[26]  L C Murphy,et al.  Differential sensitivity of human breast cancer cell lines to the growth-inhibitory effects of tamoxifen. , 1985, Cancer research.

[27]  Calyampudi R. Rao,et al.  Tests of significance in multivariate analysis. , 1948, Biometrika.

[28]  Iuliana Ionita-Laza,et al.  Sequence kernel association tests for the combined effect of rare and common variants. , 2013, American journal of human genetics.

[29]  Peter G. Rendell,et al.  Prospective Memory Function in Late Adulthood: Affect at Encoding and Resource Allocation Costs , 2015, PloS one.

[30]  Yusuke Nakamura,et al.  A Genome-Wide Association Study Identifies Novel Loci for Paclitaxel-Induced Sensory Peripheral Neuropathy in CALGB 40101 , 2012, Clinical Cancer Research.

[31]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[32]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[33]  Donna M. Muzny,et al.  Exonic duplication CNV of NDRG1 associated with autosomal-recessive HMSN-Lom/CMT4D , 2013, Genetics in Medicine.

[34]  G. Abecasis,et al.  Genotype imputation. , 2009, Annual review of genomics and human genetics.

[35]  Trisha R. Stankiewicz,et al.  Rho family GTPases: key players in neuronal development, neuronal survival, and neurodegeneration , 2014, Front. Cell. Neurosci..

[36]  A. Ruusala,et al.  The Atypical Rho GTPase Wrch1 Collaborates with the Nonreceptor Tyrosine Kinases Pyk2 and Src in Regulating Cytoskeletal Dynamics , 2007, Molecular and Cellular Biology.

[37]  J. Alan,et al.  The Atypical Rho GTPase CHW-1 Works with SAX-3/Robo To Mediate Axon Guidance in Caenorhabditis elegans , 2018, G3: Genes, Genomes, Genetics.

[38]  T. Südhof,et al.  SynCAM, a Synaptic Adhesion Molecule That Drives Synapse Assembly , 2002, Science.

[39]  D.,et al.  Regression Models and Life-Tables , 2022 .

[40]  B. Efron The Efficiency of Cox's Likelihood Function for Censored Data , 1977 .

[41]  Christiana Ruhrberg,et al.  Neuropilin 1 and 2 control cranial gangliogenesis and axon guidance through neural crest cells , 2008, Development.

[42]  Hanns Lochmüller,et al.  Mutation screening of the N‐myc downstream‐regulated gene 1 (NDRG1) in patients with Charcot‐Marie‐Tooth Disease , 2003, Human mutation.

[43]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[44]  C. Geyer,et al.  Maximum likelihood for interval censored data: Consistency and computation , 1994 .

[45]  R. Prentice,et al.  Regression analysis of grouped survival data with application to breast cancer data. , 1978, Biometrics.

[46]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .