Statistica Sinica Preprint No : SS-2016-0199 R 2 Title Predicting disease Risk by Transformation Models in the Presence of Missing Subgroup Identifiers

Some biomedical studies lead to mixture data. When a subgroup membership is missing for some of the subjects in a study, the distribution of the outcome is a mixture of the subgroup-specific distributions. Taking into account the uncertain distribution of the group membership and the covariates, we model the relation between the disease onset time and the covariates through transformation models in each sub-population, and develop a nonparametric maximum likelihood-based estimation implemented through the EM algorithm along with its inference procedure. We propose methods to identify the covariates that have different effects or common effects in distinct populations, which enables parsimonious modeling and better understanding of the differences across populations. The methods are illustrated through extensive simulation studies and a data example.

[1]  Yuanjia Wang,et al.  Estimating disease onset distribution functions in mutation carriers with censored mixture data , 2014 .

[2]  Gang Li,et al.  Latent Subgroup Analysis of a Randomized Clinical Trial through a Semiparametric Accelerated Failure Time Mixture Model , 2013, Biometrics.

[3]  E. Ray Dorsey,et al.  Characterization of a Large Group of Individuals with Huntington Disease and Their Relatives Enrolled in the COHORT Study , 2012, PloS one.

[4]  Jane S. Paulsen,et al.  Indexing disease progression at study entry with individuals at‐risk for Huntington disease , 2011, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[5]  Donglin Zeng,et al.  Maximum likelihood estimation in semiparametric regression models with censored data , 2007, Statistica Sinica.

[6]  Hajo Holzmann,et al.  Identifiability of Finite Mixtures of Elliptical Distributions , 2006 .

[7]  D. Zeng,et al.  Efficient estimation of semiparametric transformation models for counting processes , 2006 .

[8]  Jane S. Paulsen,et al.  A new model for prediction of the age of onset and penetrance for Huntington's disease based on CAG length , 2004, Clinical genetics.

[9]  A. Bernardes,et al.  Removal of cadmium and cyanide from aqueous solutions through electrodialysis , 2003 .

[10]  T. Foroud,et al.  Differences in duration of Huntington’s disease based on age at onset , 1999, Journal of neurology, neurosurgery, and psychiatry.

[11]  S Wacholder,et al.  The kin-cohort study for estimating penetrance. , 1998, American journal of epidemiology.

[12]  T. Beaty,et al.  Fundamentals of Genetic Epidemiology , 1994 .

[13]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[14]  Tanya P Garcia,et al.  Journal of the American Statistical Association Nonparametric Estimation for Censored Mixture Data with Application to the Cooperative Huntington's Observational Research Trial Nonparametric Estimation for Censored Mixture Data with Application to the Cooperative Huntington's Observational Research , 2022 .