Generative Adversarial Networks (GANs) Based Synthetic Sampling for Predictive Modeling

In the present report we evaluate the possible utility of the Generative Adversarial Networks (GANs) in mapping the chemical structural space for molecular property profiles, with the goal of subsequently yielding synthetic (artificial) samples for ligand‐based molecular modeling. Two case studies are considered: BACE‐1 (β‐Secretase 1) and DENV (Dengue Virus) inhibitory activities, with the former focused on data populating and the latter on data balancing tasks. We train GANs using subsamples extracted from datasets for each bioactivity endpoint, and apply the trained networks in generating synthetic examples from the respective bioactivity chemical spaces. Original and synthetic samples are pooled together and employed to build BACE‐1 and DENV inhibitory activity classifiers and their performance evaluated over tenfold external validation sets. In both case studies, the obtained classifiers demonstrate satisfactory predictivity with the former yielding accuracy (ACC) and Mathew's correlation coefficient (MCC) values of 0.80 and 0.59, while the latter produces balanced accuracy(BACC) and MCC values of 0.81 and 0.70, respectively. Moreover, the statistics of these classifiers are compared with those of other models in the literature demonstrating comparable to better performance. These results suggest that GANs may be useful in mapping the chemical space for molecular property profiles of interest, and thus allow for the extraction of synthetic examples for computational modeling.

[1]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[2]  Igor I Baskin,et al.  Neural Networks in Building QSAR Models , 2009, Artificial Neural Networks.

[3]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[4]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[5]  Igor I. Baskin,et al.  Neural networks as a method for elucidating structure–property relationships for organic compounds , 2003 .

[6]  Anjana Gosain,et al.  Handling class imbalance problem using oversampling techniques: A review , 2017, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[7]  Igor V Tetko,et al.  A renaissance of neural networks in drug discovery , 2016, Expert opinion on drug discovery.

[8]  Alán Aspuru-Guzik,et al.  Reinforced Adversarial Neural Computer for de Novo Molecular Design , 2018, J. Chem. Inf. Model..

[9]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[10]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[11]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[12]  Yovani Marrero-Ponce,et al.  Extended GT-STAF information indices based on Markov approximation models , 2013 .

[13]  Stephen J. Barigye,et al.  Undersampling: case studies of flaviviral inhibitory activities , 2019, Journal of Computer-Aided Molecular Design.

[14]  Francisco Torrens,et al.  Relations frequency hypermatrices in mutual, conditional, and joint entropy‐based information indices , 2013, J. Comput. Chem..

[15]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[16]  Taeho Jo,et al.  A Multiple Resampling Method for Learning from Imbalanced Data Sets , 2004, Comput. Intell..

[17]  Lin Zhu,et al.  Generative Adversarial Networks for Hyperspectral Image Classification , 2018, IEEE Transactions on Geoscience and Remote Sensing.

[18]  Francisco Torrens,et al.  Shannon's, mutual, conditional and joint entropy information indices: generalization of global indices defined from local vertex invariants. , 2013, Current computer-aided drug design.

[19]  Vijay S. Pande,et al.  Computational Modeling of β-Secretase 1 (BACE-1) Inhibitors Using Ligand Based Approaches , 2016, J. Chem. Inf. Model..

[20]  Gianni De Fabritiis,et al.  From Target to Drug: Generative Modeling for Multimodal Structure-Based Ligand Design. , 2019, Molecular pharmaceutics.

[21]  Thomas Blaschke,et al.  The rise of deep learning in drug discovery. , 2018, Drug discovery today.

[22]  Paul Babyn,et al.  Generative Adversarial Network in Medical Imaging: A Review , 2018, Medical Image Anal..

[23]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[24]  Danail Bonchev,et al.  Trends in information theory-based chemical structure codification , 2014, Molecular Diversity.