ChemTreeMap: an interactive map of biochemical similarity in molecular datasets

MOTIVATION What if you could explain complex chemistry in a simple tree and share that data online with your collaborators? Computational biology often incorporates diverse chemical data to probe a biological question, but the existing tools for chemical data are ill-suited for the very large datasets inherent to bioinformatics. Furthermore, existing visualization methods often require an expert chemist to interpret the patterns. Biologists need an interactive tool for visualizing chemical information in an intuitive, accessible way that facilitates its integration into today's team-based biological research. RESULTS ChemTreeMap is an interactive, bioinformatics tool designed to explore chemical space and mine the relationships between chemical structure, molecular properties, and biological activity. ChemTreeMap synergistically combines extended connectivity fingerprints and a neighbor-joining algorithm to produce a hierarchical tree with branch lengths proportional to molecular similarity. Compound properties are shown by leaf color, size and outline to yield a user-defined visualization of the tree. Two representative analyses are included to demonstrate ChemTreeMap's capabilities and utility: assessing dataset overlap and mining structure-activity relationships. AVAILABILITY AND IMPLEMENTATION The examples from this paper may be accessed at http://ajing.github.io/ChemTreeMap/ Code for the server and client are available in the Supplementary Information, at the aforementioned github site, and on Docker Hub (https://hub.docker.com) with the nametag ajing/chemtreemap. CONTACT carlsonh@umich.eduSupplementary information: Supplementary data are available at Bioinformatics online.

[1]  G. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions. , 1999 .

[2]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[3]  Rajarshi Guha,et al.  Structure—Activity Landscape Index: Identifying and Quantifying Activity Cliffs. , 2008 .

[4]  Thomas Mailund,et al.  Rapid Neighbour-Joining , 2008, WABI.

[5]  Rajarshi Guha,et al.  Synergy Maps: exploring compound combinations using network-based visualization , 2015, Journal of Cheminformatics.

[6]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[7]  Jens Lagergren,et al.  Algorithms in Bioinformatics, 8th International Workshop, WABI 2008, Karlsruhe, Germany, September 15-19, 2008. Proceedings , 2008, WABI.

[8]  M. Nei,et al.  Prospects for inferring very large phylogenies by using the neighbor-joining method. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Mohammed Al-Shalalfa,et al.  Prediction of novel drug indications using network driven biological data prioritization and integration , 2014, Journal of Cheminformatics.

[10]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[11]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[12]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[13]  Lior Pachter,et al.  Why Neighbor-Joining Works , 2006, Algorithmica.

[14]  F. Sanz,et al.  Anchor-GRIND: filling the gap between standard 3D QSAR and the GRid-INdependent descriptors. , 2005 .

[15]  Stefan Wetzel,et al.  Interactive exploration of chemical space with Scaffold Hunter. , 2009, Nature chemical biology.

[16]  Peter Willett,et al.  Representing Clusters Using a Maximum Common Edge Substructure Algorithm Applied to Reduced Graphs and Molecular Graphs , 2007, J. Chem. Inf. Model..

[17]  Jean-Louis Reymond,et al.  MQN-Mapplet: Visualization of Chemical Space with Interactive Maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13 , 2013, J. Chem. Inf. Model..

[18]  Olivier Sperandio,et al.  An exploration of the 3D chemical space has highlighted a specific shape profile for the compounds intended to inhibit protein-protein interactions , 2015, BMC Bioinformatics.

[19]  Emden R. Gansner,et al.  Graphviz - Open Source Graph Drawing Tools , 2001, GD.

[20]  Ramaswamy Nilakantan,et al.  Topological torsion: a new molecular descriptor for SAR applications. Comparison with other descriptors , 1987, J. Chem. Inf. Comput. Sci..

[21]  Fabio Casati,et al.  Web service conversation modeling: a cornerstone for e-business automation , 2004, IEEE Internet Computing.

[22]  David Rogers,et al.  Cheminformatics analysis and learning in a data pipelining environment , 2006, Molecular Diversity.

[23]  Stefan Kramer,et al.  CheS-Mapper 2.0 for visual validation of (Q)SAR models , 2014, Journal of Cheminformatics.

[24]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[25]  R. Venkataraghavan,et al.  Atom pairs as molecular features in structure-activity studies: definition and applications , 1985, J. Chem. Inf. Comput. Sci..

[26]  Gobbi,et al.  Genetic optimization of combinatorial libraries , 1998, Biotechnology and bioengineering.

[27]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[28]  Jürgen Bajorath,et al.  Exploring Activity Cliffs from a Chemoinformatics Perspective , 2014, Molecular informatics.

[29]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[30]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[31]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[32]  Liu Web-Scale k-means + + Clustering on PowerPS Zhanhao ( Jasper ) , 2017 .

[33]  Anja Richter,et al.  2D and 3D similarity landscape analysis identifies PARP as a novel off-target for the drug Vatalanib , 2015, BMC Bioinformatics.

[34]  Heather A Carlson,et al.  Differences between high- and low-affinity complexes of enzymes and nonenzymes. , 2008, Journal of medicinal chemistry.

[35]  Mathias Wawer,et al.  Similarity-Potency Trees: A Method to Search for SAR Information in Compound Data Sets and Derive SAR Rules , 2010, J. Chem. Inf. Model..

[36]  Stefan Wetzel,et al.  The Scaffold Tree - Visualization of the Scaffold Universe by Hierarchical Scaffold Classification , 2007, J. Chem. Inf. Model..

[37]  David Kombo,et al.  Making SharePoint® Chemically Aware™ , 2012, Journal of Cheminformatics.

[38]  Stefan Kramer,et al.  CheS-Mapper - Chemical Space Mapping and Visualization in 3D , 2012, Journal of Cheminformatics.

[39]  Marvin Waldman,et al.  Optimization and visualization of molecular diversity of combinatorial libraries , 1996, Molecular Diversity.

[40]  Zongliang Yue,et al.  DMAP: a connectivity map database to enable identification of novel drug repositioning candidates , 2015, BMC Bioinformatics.

[41]  J. Sutherland,et al.  A comparison of methods for modeling quantitative structure-activity relationships. , 2004, Journal of medicinal chemistry.

[42]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[43]  Evan Bolton,et al.  PubChem's BioAssay Database , 2011, Nucleic Acids Res..

[44]  Jaques Reifman,et al.  Exploiting large-scale drug-protein interaction information for computational drug repurposing , 2014, BMC Bioinformatics.

[45]  Thomas Sander,et al.  DataWarrior: An Open-Source Program For Chemistry Aware Data Visualization And Analysis , 2015, J. Chem. Inf. Model..

[46]  Knut Baumann,et al.  inSARa: Intuitive and Interactive SAR Interpretation by Reduced Graphs and Hierarchical MCS-Based Network Navigation , 2014, J. Chem. Inf. Model..