Adverse Drug Reaction Prediction with Symbolic Latent Dirichlet Allocation

Adverse drug reaction (ADR) is a major burden for patients and healthcare industry. It usually causes preventable hospitalizations and deaths, while associated with a huge amount of cost. Traditional preclinical in vitro safety profiling and clinical safety trials are restricted in terms of small scale, long duration, huge financial costs and limited statistical significance. The availability of large amounts of drug and ADR data potentially allows ADR predictions during the drugs’ early preclinical stage with data analytics methods to inform more targeted clinical safety tests. Despite their initial success, existing methods have trade-offs among interpretability, predictive power and efficiency. This urges us to explore methods that could have all these strengths and provide practical solutions for real world ADR predictions. We cast the ADR-drug relation structure into a three-layer hierarchical Bayesian model. We interpret each ADR as a symbolic word and apply latent Dirichlet allocation (LDA) to learn topics that may represent certain biochemical mechanism that relates ADRs with drug structures. Based on LDA, we designed an equivalent regularization term to incorporate the hierarchical ADR domain knowledge. Finally, we developed a mixed input model leveraging a fast collapsed Gibbs sampling method that the complexity of each iteration of Gibbs sampling proportional only to the number of positive ADRs. Experiments on real world data show our models achieved higher prediction accuracy and shorter running time than the state-of-the-art alternatives.

[1]  R. Krauss,et al.  When good drugs go bad , 2007, Nature.

[2]  Quan Xu,et al.  ADReCS: an ontology database for aiding standardization and hierarchical classification of adverse drug reaction terms , 2014, Nucleic Acids Res..

[3]  Xiaowei Xu,et al.  Mining FDA drug labels using an unsupervised learning technique - topic modeling , 2011, BMC Bioinformatics.

[4]  M. Milik,et al.  Mapping adverse drug reactions in chemical space. , 2009, Journal of medicinal chemistry.

[5]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Mark Dredze,et al.  Experimenting with Drugs (and Topic Models): Multi-Dimensional Exploration of Recreational Drug Discussions , 2012, AAAI Fall Symposium: Information Retrieval and Knowledge Discovery in Biomedical Text.

[8]  Jácint Szabó,et al.  Linked latent Dirichlet allocation in web spam filtering , 2009, AIRWeb '09.

[9]  A. Bender,et al.  Analysis of Pharmacology Data and the Prediction of Adverse Drug Reactions and Off‐Target Effects from Chemical Structure , 2007, ChemMedChem.

[10]  Wes McKinney,et al.  pandas: a Foundational Python Library for Data Analysis and Statistics , 2011 .

[11]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[12]  Chao Liu,et al.  A probabilistic approach to spatiotemporal theme pattern mining on weblogs , 2006, WWW '06.

[13]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[14]  Li Fei-Fei,et al.  Spatially coherent latent topic model for concurrent object segmentation and classification , 2007 .

[15]  Rong Li,et al.  Inductive matrix completion for predicting adverse drug reactions (ADRs) integrating drug–target interactions , 2015 .

[16]  Hua Xu,et al.  Large-scale prediction of adverse drug reactions using chemical, biological, and phenotypic properties of drugs , 2012, J. Am. Medical Informatics Assoc..

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Alain Baccini,et al.  CCA: An R Package to Extend Canonical Correlation Analysis , 2008 .

[19]  Ola Caster,et al.  Mining the WHO Drug Safety Database Using Lasso Logistic Regression , 2007 .

[20]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[21]  Yoshihiro Yamanishi,et al.  Predicting drug side-effect profiles: a chemical fragment-based approach , 2011, BMC Bioinformatics.

[22]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[23]  Bruce McCune,et al.  INFLUENCE OF NOISY ENVIRONMENTAL DATA ON CANONICAL CORRESPONDENCE ANALYSIS , 1997 .

[24]  D. Bojanic,et al.  Keynote review: in vitro safety pharmacology profiling: an essential tool for successful drug development. , 2005, Drug discovery today.

[25]  Yoshihiro Yamanishi,et al.  Drug Side-Effect Prediction Based on the Integration of Chemical and Biological Spaces , 2012, J. Chem. Inf. Model..