Clustering Bioactive Molecules in 3D Chemical Space with Unsupervised Deep Learning

Unsupervised clustering has broad applications in data stratification, pattern investigation and new discovery beyond existing knowledge. In particular, clustering of bioactive molecules facilitates chemical space mapping, structure-activity studies, and drug discovery. These tasks, conventionally conducted by similarity-based methods, are complicated by data complexity and diversity. We ex-plored the superior learning capability of deep autoencoders for unsupervised clustering of 1.39 mil-lion bioactive molecules into band-clusters in a 3-dimensional latent chemical space. These band-clusters, displayed by a space-navigation simulation software, band molecules of selected bioactivity classes into individual band-clusters possessing unique sets of common sub-structural features beyond structural similarity. These sub-structural features form the frameworks of the literature-reported pharmacophores and privileged fragments. Within each band-cluster, molecules are further banded into selected sub-regions with respect to their bioactivity target, sub-structural features and molecular scaffolds. Our method is potentially applicable for big data clustering tasks of different fields.

[1]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[2]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Alex M Aronov,et al.  Toward a pharmacophore for kinase frequent hitters. , 2004, Journal of medicinal chemistry.

[4]  Robert P Bywater,et al.  Recognition of privileged structures by G-protein coupled receptors. , 2004, Journal of medicinal chemistry.

[5]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[6]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[7]  G. V. Paolini,et al.  Global mapping of pharmacological space , 2006, Nature Biotechnology.

[8]  Dora M Schnur,et al.  Are target-family-privileged substructures truly privileged? , 2006, Journal of medicinal chemistry.

[9]  Paul M. Selzer,et al.  Clustering and Rule‐Based Classifications of Chemical Structures Evaluated in the Biological Activity Space. , 2007 .

[10]  Jeremy L. Jenkins,et al.  Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space , 2007, J. Chem. Inf. Model..

[11]  Jeffrey Jie-Lou Liao,et al.  Molecular Recognition of Protein Kinase Binding Pockets for Design of Potent and Selective Kinase Inhibitors , 2007 .

[12]  M. Murcko,et al.  Kinase-likeness and kinase-privileged fragments: toward virtual polypharmacology. , 2008, Journal of medicinal chemistry.

[13]  Stefan Wetzel,et al.  Bioactivity-guided mapping and navigation of chemical space. , 2009, Nature chemical biology.

[14]  Stefan Wetzel,et al.  Interactive exploration of chemical space with Scaffold Hunter. , 2009, Nature chemical biology.

[15]  Sheng-Yong Yang,et al.  Pharmacophore modeling and applications in drug discovery: challenges and recent advances. , 2010, Drug discovery today.

[16]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[17]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[18]  Simon J. Doran,et al.  Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Tian Zhu,et al.  Hit identification and optimization in virtual screening: practical recommendations based on a critical literature analysis. , 2013, Journal of medicinal chemistry.

[20]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[21]  Jean-Louis Reymond,et al.  SMIfp (SMILES fingerprint) Chemical Space for Virtual Screening and Visualization of Large Databases of Organic Molecules , 2013, J. Chem. Inf. Model..

[22]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[23]  S. Schneeweiss Learning from big health care data. , 2014, The New England journal of medicine.

[24]  Fabrício F. Costa Big data in biomedicine. , 2014, Drug discovery today.

[25]  Heejun Kim,et al.  Privileged structures: efficient chemical "navigators" toward unexplored biologically relevant chemical spaces. , 2014, Journal of the American Chemical Society.

[26]  Christopher W Murray,et al.  Efficient exploration of chemical space by fragment-based screening. , 2014, Progress in biophysics and molecular biology.

[27]  Eric Bender,et al.  Big data in biomedicine , 2015, Nature.

[28]  Jean-Louis Reymond,et al.  Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical Spaces , 2015, J. Chem. Inf. Model..

[29]  Nico Karssemeijer,et al.  Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammographic Risk Scoring , 2016, IEEE Transactions on Medical Imaging.

[30]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Jacek M. Zurada,et al.  Deep Learning of Part-Based Representation of Data Using Sparse Autoencoders With Nonnegativity Constraints , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Ray P Norris Extragalactic radio continuum surveys and the transformation of radio astronomy , 2017 .

[33]  Junwei Han,et al.  Mesh Convolutional Restricted Boltzmann Machines for Unsupervised Learning of Features With Structure Preservation on 3-D Meshes , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[34]  Petra Schneider,et al.  Privileged Structures Revisited , 2017, Angewandte Chemie.

[35]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[36]  Anne E Carpenter,et al.  Reconstructing cell cycle and disease progression using deep learning , 2017, Nature Communications.

[37]  Xinghua Lu,et al.  Unsupervised deep learning reveals prognostically relevant subtypes of glioblastoma , 2017, BMC Bioinformatics.

[38]  Jonathan Krause,et al.  Using deep learning and Google Street View to estimate the demographic makeup of neighborhoods across the United States , 2017, Proceedings of the National Academy of Sciences.

[39]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[40]  Nic Fleming,et al.  How artificial intelligence is changing drug discovery , 2018, Nature.

[41]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.