Automated Structure-Activity Relationship Mining: Connecting Chemical Structure to Biological Profiles

Understanding structure–activity relationships (SARs) of small molecules is important for developing probes and novel therapeutic agents in chemical biology and drug discovery. Increasingly multiplexed small‐molecule profiling assays allow simultaneous measurement of many biological response parameters for the same compound, e.g. expression levels for many genes or binding constants against many proteins. While such methods promise to capture SARs with high granularity, few computational methods are available to support SAR analyses of high‐dimensional compound activity profiles. Many of these methods are not generally applicable or reduce the activity space to scalar summary statistics before establishing SARs. In this article, we present a versatile computational method that automatically extracts interpretable SAR rules from high‐dimensional profiling data. The rules connect chemical structural features of compounds to patterns in their biological activity profiles. We applied our method to data from novel cell‐based gene‐expression and imaging assays collected on more than 30,000 small molecules. Based on the rules identified for this dataset, we prioritized groups of compounds for further study, including a novel set of putative histone deacetylase inhibitors. We present here a general approach to analyzing SARs for large numbers of high‐ dimensional biological profiles using FPM and, in addition, ARM to automatically formulate SAR rules. Rules that connect chemical features to patterns in biological profiles are automatically identified and ranked by interestingness. We evaluated our method on gene‐ expression and cell‐morphology profiles for more than 30,000 compounds. The compound collection contains subsets representative of common screening libraries assembled from various sources as well as planned synthetic libraries of compounds with well‐defined structural relationships. We used different chemical and biological descriptors to tailor the general approach to specific requirements of the compound library.

[1]  Anne E Carpenter,et al.  Multiplex Cytological Profiling Assay to Measure Diverse Cellular States , 2013, PloS one.

[2]  Joshua C. Gilbert,et al.  An Interactive Resource to Identify Cancer Genetic and Lineage Dependencies Targeted by Small Molecules , 2013, Cell.

[3]  Giovanni Roti,et al.  Selective HDAC1/HDAC2 inhibitors induce neuroblastoma differentiation. , 2013, Chemistry & biology.

[4]  Jeremy R. Duvall,et al.  Synthesis of stereochemically and skeletally diverse fused ring systems from functionalized C-glycosides. , 2013, The Journal of organic chemistry.

[5]  L. A. Marcaurelle,et al.  Application of a catalytic asymmetric Povarov reaction using chiral ureas to the synthesis of a tetrahydroquinoline library. , 2012, ACS combinatorial science.

[6]  Jürgen Bajorath,et al.  Navigating High-Dimensional Activity Landscapes: Design and Application of the Ligand-Target Differentiation Map , 2012, J. Chem. Inf. Model..

[7]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[8]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[9]  Carol A. Mulrooney,et al.  Build/couple/pair strategy for the synthesis of stereochemically diverse macrolactams via head-to-tail cyclization. , 2012, ACS combinatorial science.

[10]  Austin B. Yongye,et al.  Multitarget Structure-Activity Relationships Characterized by Activity-Difference Maps and Consensus Similarity Measure , 2011, J. Chem. Inf. Model..

[11]  Jeremy R. Duvall,et al.  Synthesis of a stereochemically diverse library of medium-sized lactams and sultams via S(N)Ar cycloetherification. , 2011, ACS combinatorial science.

[12]  J. Bajorath,et al.  BindingDB and ChEMBL: online compound databases for drug discovery , 2011, Expert opinion on drug discovery.

[13]  Anne E Carpenter,et al.  Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software , 2011, Bioinform..

[14]  L. A. Marcaurelle,et al.  Large-scale synthesis of all stereoisomers of a 2,3-unsaturated C-glycoside scaffold. , 2011, The Journal of organic chemistry.

[15]  Eamon Comer,et al.  Fragment-based domain shuffling approach for the synthesis of pyran-based macrocycles , 2011, Proceedings of the National Academy of Sciences.

[16]  Nathan T. Ross,et al.  An aldol-based build/couple/pair strategy for the synthesis of medium- and large-sized rings: discovery of macrocyclic histone deacetylase inhibitors. , 2010, Journal of the American Chemical Society.

[17]  Anne Mai Wassermann,et al.  Computational Analysis of Multi‐target Structure–Activity Relationships to Derive Preference Orders for Chemical Modifications toward Target Selectivity , 2010, ChemMedChem.

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  James E. Bradner,et al.  Chemical Phylogenetics of Histone Deacetylases , 2010, Nature chemical biology.

[20]  Paul A Clemons,et al.  Connecting synthetic chemistry decisions to cell and genome biology using small-molecule phenotypic profiling. , 2009, Current opinion in chemical biology.

[21]  Mathias Wawer,et al.  Extraction of structure-activity relationship information from high-throughput screening data. , 2009, Current medicinal chemistry.

[22]  John A. Tallarico,et al.  Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds , 2009, Nature Reviews Drug Discovery.

[23]  Damian W. Young,et al.  Accessing skeletal diversity using catalyst control: formation of n and n + 1 macrocyclic triazole rings. , 2009, Organic letters.

[24]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[25]  Eugen Lounkine,et al.  Formal concept analysis for the identification of molecular fragment combinations specific for active and highly potent compounds. , 2008, Journal of medicinal chemistry.

[26]  Stuart L Schreiber,et al.  Towards the optimal screening collection: a synthesis strategy. , 2008, Angewandte Chemie.

[27]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[28]  John A. Tallarico,et al.  Integrating high-content screening and ligand-target prediction to identify mechanism of action. , 2008, Nature chemical biology.

[29]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[30]  Jessica E. Bolden,et al.  Anticancer activities of histone deacetylase inhibitors , 2006, Nature Reviews Drug Discovery.

[31]  T. Golub,et al.  A method for high-throughput gene expression signature analysis , 2006, Genome Biology.

[32]  Kurt Hornik,et al.  Introduction to arules – A computational environment for mining association rules and frequent item sets , 2009 .

[33]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[34]  S. Schreiber,et al.  A planning strategy for diversity-oriented synthesis. , 2004, Angewandte Chemie.

[35]  Thomas Dyhre Nielsen,et al.  Symbolic and Quantitative Approaches to Reasoning with Uncertainty , 2003, Lecture Notes in Computer Science.

[36]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[37]  Christian Borgelt,et al.  Mining molecular fragments: finding relevant substructures of molecules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[38]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[39]  Effect of the high-affinity estrogen receptor ligand ICI 182,780 on the rat tibia. , 1998, Endocrinology.

[40]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[41]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[42]  Education Division Studies in social psychology in World War II , 1949 .

[43]  Billy I. Ross,et al.  The American Soldier. , 1898 .