Inferring phenotypes from substance use via collaborative matrix completion

BackgroundAlthough substance use disorders (SUDs) are heritable, few genetic risk factors for them have been identified, in part due to the small sample sizes of study populations. To address this limitation, researchers have aggregated subjects from multiple existing genetic studies, but these subjects can have missing phenotypic information, including diagnostic criteria for certain substances that were not originally a focus of study. Recent advances in addiction neurobiology have shown that comorbid SUDs (e.g., the abuse of multiple substances) have similar genetic determinants, which makes it possible to infer missing SUD diagnostic criteria using criteria from another SUD and patient genotypes through statistical modeling.ResultsWe propose a new approach based on matrix completion techniques to integrate features of comorbid health conditions and individual’s genotypes to infer unreported diagnostic criteria for a disorder. This approach optimizes a bi-linear model that uses the interactions between known disease correlations and candidate genes to impute missing criteria. An efficient stochastic and parallel algorithm was developed to optimize the model with a speed 20 times greater than the classic sequential algorithm. It was tested on 3441 subjects who had both cocaine and opioid use disorders and successfully inferred missing diagnostic criteria with consistently better accuracy than other recent statistical methods.ConclusionsThe proposed matrix completion imputation method is a promising tool to impute unreported or unobserved symptoms or criteria for disease diagnosis. Integrating data at multiple scales or from heterogeneous sources may help improve the accuracy of phenotype imputation.

[1]  Leon Wenliang Zhong,et al.  Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[2]  Inderjit S. Dhillon,et al.  Matrix Completion with Noisy Side Information , 2015, NIPS.

[3]  Inderjit S. Dhillon,et al.  Provable Inductive Matrix Completion , 2013, ArXiv.

[4]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[5]  Wayne Hall,et al.  Extent of illicit drug use and dependence, and their contribution to the global burden of disease , 2012, The Lancet.

[6]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[7]  N. Wray,et al.  Research review: Polygenic methods and their application to psychiatric traits. , 2014, Journal of child psychology and psychiatry, and allied disciplines.

[8]  Guangcan Liu,et al.  Low-Rank Matrix Completion in the Presence of High Coherence , 2016, IEEE Transactions on Signal Processing.

[9]  Jinbo Bi,et al.  A Sparse Interactive Model for Matrix Completion with Side Information , 2016, NIPS.

[10]  Rong Jin,et al.  Stochastic Convex Optimization with Multiple Objectives , 2013, NIPS.

[11]  Michael R. Johnson,et al.  Re-evaluation of SNP heritability in complex human traits , 2016, Nature Genetics.

[12]  Hongyu Zhao,et al.  Genomewide association study of cocaine dependence and related traits: FAM53B identified as a risk gene , 2013, Molecular Psychiatry.

[13]  Yun Li,et al.  METAL: fast and efficient meta-analysis of genomewide association scans , 2010, Bioinform..

[14]  Kevin P. Jensen,et al.  A Review of Genome-Wide Association Studies of Stimulant and Opioid Use Disorders , 2016, Molecular Neuropsychiatry.

[15]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[16]  Suvrit Sra,et al.  Towards an optimal stochastic alternating direction method of multipliers , 2014, ICML.

[17]  Hongyu Zhao,et al.  Genome-Wide Association Study of Opioid Dependence: Multiple Associations Mapped to Calcium and Potassium Pathways , 2014, Biological Psychiatry.

[18]  Sachin Garg,et al.  Response prediction using collaborative filtering with hierarchies and side-information , 2011, KDD.

[19]  Junfeng Yang,et al.  Linearized augmented Lagrangian and alternating direction methods for nuclear norm minimization , 2012, Math. Comput..

[20]  Alexander G. Gray,et al.  Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[21]  Ashutosh Kumar Singh,et al.  Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, The Lancet.

[22]  David Goldman,et al.  The genetics of addictions: uncovering the genes , 2005, Nature Reviews Genetics.

[23]  Miao Xu,et al.  Speedup Matrix Completion with Side Information: Application to Multi-Label Learning , 2013, NIPS.

[24]  Jinbo Bi,et al.  Collaborative phenotype inference from comorbid substance use disorders and genotypes , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[25]  Alisha R Pollastri,et al.  Diagnostic reliability of the Semi-structured Assessment for Drug Dependence and Alcoholism (SSADDA). , 2005, Drug and alcohol dependence.

[26]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[27]  Feng Cheng,et al.  Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers , 2017, NIPS.

[28]  M. Stephens,et al.  Genome-wide Efficient Mixed Model Analysis for Association Studies , 2012, Nature Genetics.

[29]  Nagarajan Natarajan,et al.  Inductive matrix completion for predicting gene–disease associations , 2014, Bioinform..

[30]  Xiaohan Wei,et al.  Online Convex Optimization with Stochastic Constraints , 2017, NIPS.

[31]  Alan Ross,et al.  The Effectiveness of Methadone Maintenance Treatment: Patients, Programs, Services, and Outcome , 1991 .

[32]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[33]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[34]  Ashutosh Kumar Singh,et al.  Global, regional, and national disability-adjusted life-years (DALYs) for 315 diseases and injuries and healthy life expectancy (HALE), 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, Lancet.

[35]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[36]  Po-Ru Loh,et al.  Mixed-model association for biobank-scale datasets , 2018, Nature Genetics.

[37]  Joel Gelernter,et al.  Reliability of DSM-IV diagnostic criteria using the semi-structured assessment for drug dependence and alcoholism (SSADDA). , 2007, Drug and alcohol dependence.

[38]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.