SVSI: Fast and Powerful Set‐Valued System Identification Approach to Identifying Rare Variants in Sequencing Studies for Ordered Categorical Traits

In genetic association studies of an ordered categorical phenotype, it is usual to either regroup multiple categories of the phenotype into two categories and then apply the logistic regression (LG), or apply ordered logistic (oLG), or ordered probit (oPRB) regression, which accounts for the ordinal nature of the phenotype. However, they may lose statistical power or may not control type I error due to their model assumption and/or instable parameter estimation algorithm when the genetic variant is rare or sample size is limited. To solve this problem, we propose a set‐valued (SV) system model to identify genetic variants associated with an ordinal categorical phenotype. We couple this model with a SV system identification algorithm to identify all the key system parameters. Simulations and two real data analyses show that SV and LG accurately controlled the Type I error rate even at a significance level of 10−6 but not oLG and oPRB in some cases. LG had significantly less power than the other three methods due to disregarding of the ordinal nature of the phenotype, and SV had similar or greater power than oLG and oPRB. We argue that SV should be employed in genetic association studies for ordered categorical phenotype.

[1]  Yanlong Zhao,et al.  Iterative parameter estimate with batched binary-valued observations , 2016, Science China Information Sciences.

[2]  田原 康玄,et al.  生活習慣病とgenome-wide association study , 2015 .

[3]  Cheng Cheng,et al.  A New System Identification Approach to Identify Genetic Variants in Sequencing Studies for a Binary Phenotype , 2014, Human Heredity.

[4]  Peggy Hall,et al.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations , 2013, Nucleic Acids Res..

[5]  M. Eileen Dolan,et al.  Cancer pharmacogenomics: strategies and challenges , 2012, Nature Reviews Genetics.

[6]  Cheng Cheng,et al.  Genome-wide association study identifies germline polymorphisms associated with relapse of childhood acute lymphoblastic leukemia. , 2012, Blood.

[7]  M. Rieder,et al.  Exome sequencing of extreme phenotypes identifies DCTN4 as a modifier of chronic Pseudomonas aeruginosa infection in cystic fibrosis , 2012, Nature Genetics.

[8]  L. Ljung,et al.  Impulse Response Estimation with Binary Measurements: A Regularized FIR Model Approach , 2012 .

[9]  H. Ghang,et al.  A genome-wide association study for irinotecan-related severe toxicities in patients with advanced non-small-cell lung cancer , 2012, The Pharmacogenomics Journal.

[10]  Yusuke Nakamura,et al.  A Genome-Wide Association Study of Overall Survival in Pancreatic Cancer Patients Treated with Gemcitabine in CALGB 80303 , 2011, Clinical Cancer Research.

[11]  Graham C. Goodwin,et al.  On identification of FIR systems having quantized output data , 2011, Autom..

[12]  Anbupalam Thalamuthu,et al.  A genome-wide association study of hepatitis B vaccine response in an Indonesian population reveals multiple independent risk variants in the HLA region. , 2011, Human molecular genetics.

[13]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[14]  Yusuke Nakamura,et al.  Genome-wide associations and functional genomic studies of musculoskeletal adverse events in women receiving aromatase inhibitors. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  G. Yin,et al.  System Identification with Quantized Observations , 2010 .

[17]  N. Schork,et al.  Methods in Genetics and Clinical Interpretation Extremes of Unexplained Variation as a Phenotype An Efficient Approach for Genome-Wide Association Studies of Cardiovascular Disease , 2010 .

[18]  Cheng Cheng,et al.  Germline genetic variation in an organic anion transporter polypeptide associated with methotrexate pharmacokinetics and clinical effects. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  Cheng Cheng,et al.  Treating childhood acute lymphoblastic leukemia without cranial irradiation. , 2009, The New England journal of medicine.

[20]  Cheng Cheng,et al.  Genome-wide interrogation of germline genetic variation associated with treatment response in childhood acute lymphoblastic leukemia. , 2009, JAMA.

[21]  Robin J. Evans,et al.  Feedback Control Under Data Rate Constraints: An Overview , 2007, Proceedings of the IEEE.

[22]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[23]  Cheng Cheng,et al.  Improved outcome for children with acute lymphoblastic leukemia: results of Total Therapy Study XIIIB at St Jude Children's Research Hospital. , 2004, Blood.

[24]  Le Yi Wang,et al.  System identification using binary sensors , 2003, IEEE Trans. Autom. Control..

[25]  J J Shuster,et al.  Minimal residual disease detection in childhood precursor–B-cell acute lymphoblastic leukemia: relation to other risk factors. A Children's Oncology Group study , 2003, Leukemia.

[26]  J. Moppett,et al.  The clinical relevance of detection of minimal residual disease in childhood acute lymphoblastic leukaemia , 2003, Journal of clinical pathology.

[27]  Robert Gray,et al.  A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[28]  L. Robison,et al.  Incidence of cancer in children in the United States. Sex‐, race‐, and 1‐year age‐specific rates by histologic type , 1995, Cancer.

[29]  AJ McMichael,et al.  Have increases in solar ultraviolet exposure contributed to the rise in incidence of non-Hodgkin's lymphoma? , 1996, British Journal of Cancer.

[30]  D.,et al.  Regression Models and Life-Tables , 2022 .