Mining collections of compounds with Screening Assistant 2

BackgroundHigh-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened.ResultsWe present Screening Assistant 2 (SA2), an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering. We illustrate the use of SA2 by analysing the composition of a database of 15 million compounds collected from 73 providers, in terms of scaffolds, frameworks, and undesired properties as defined by recently proposed HTS SMARTS filters. We also show how the software can be used to create diverse libraries based on existing ones.ConclusionsScreening Assistant 2 is a user-friendly, open-source software that can be used to manage collections of compounds and perform simple to advanced chemoinformatics analyses. Its modular design and growing documentation facilitate the addition of new functionalities, calling for contributions from the community. The software can be downloaded at http://sa2.sourceforge.net/.

[1]  Christoph Steinbeck,et al.  OrChem - An open source chemistry search engine for Oracle® , 2009, J. Cheminformatics.

[2]  Philip Gribbon,et al.  High-throughput hit finding and compound-profiling technologies for academic drug discovery. , 2008, Drug discovery today. Technologies.

[3]  Vincent Le Guilloux,et al.  Visual Characterization and Diversity Quantification of Chemical Libraries: 1. Creation of Delimited Reference Chemical Subspaces , 2011, J. Chem. Inf. Model..

[4]  Lorenz M Mayr,et al.  Novel trends in high-throughput screening. , 2009, Current opinion in pharmacology.

[5]  Jan Hoflack,et al.  REALISIS: A Medicinal Chemistry‐Oriented Reagent Selection, Library Design, and Profiling Platform. , 2005 .

[6]  Thorsten Meinl,et al.  Maximum-score diversity selection for early drug discovery , 2010, J. Cheminformatics.

[7]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[8]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[9]  Stephen D. Pickett,et al.  Research Papers) Design of a Compound Screening Collection for use in High Throughput Screening , 2004 .

[10]  Andreas Bender,et al.  Plate-Based Diversity Selection Based on Empirical HTS Data to Enhance the Number of Hits and Their Chemical Diversity , 2009, Journal of biomolecular screening.

[11]  Judith C. Madden,et al.  In silico toxicology : principles and applications , 2010 .

[12]  Wolf-Dietrich Ihlenfeldt,et al.  Computation and management of chemical properties in CACTVS: An extensible networked approach toward modularity and compatibility , 1994, J. Chem. Inf. Comput. Sci..

[13]  W. Patrick Walters,et al.  A guide to drug discovery: Designing screens: how to make your hits a hit , 2003, Nature Reviews Drug Discovery.

[14]  Ubbo Visser,et al.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results , 2011, BMC Bioinformatics.

[15]  Nina Jeliazkova,et al.  Chapter 17:Open Source Tools for Read-Across and Category Formation , 2010 .

[16]  Egon L. Willighagen,et al.  CDK-Taverna: an open workflow environment for cheminformatics , 2010, BMC Bioinformatics.

[17]  Frederick P. Roth,et al.  Chemical substructures that enrich for biological activity , 2008, Bioinform..

[18]  Thomas M. Oinn,et al.  The Taverna Interaction Service: enabling manual interaction in workflows , 2008, Bioinform..

[19]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[20]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[21]  Warren R. J. D. Galloway,et al.  Drug discovery: A question of library design , 2011, Nature.

[22]  Valerie J Gillet,et al.  New directions in library design and analysis. , 2008, Current opinion in chemical biology.

[23]  Nina Jeliazkova,et al.  AMBIT RESTful web services: an implementation of the OpenTox application programming interface , 2011, J. Cheminformatics.

[24]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[25]  Anne Mai Wassermann,et al.  SARANEA: A Freely Available Program To Mine Structure-Activity and Structure-Selectivity Relationship Information in Compound Data Sets , 2010, J. Chem. Inf. Model..

[26]  Darren V. S. Green,et al.  Implementation of a System for Reagent Selection and Library Enumeration, Profiling, and Design , 1999, J. Chem. Inf. Comput. Sci..

[27]  J. Frearson,et al.  HTS and hit finding in academia – from chemical genomics to drug discovery , 2009, Drug discovery today.

[28]  Pantelis Sopasakis,et al.  Collaborative development of predictive toxicology applications , 2010, J. Cheminformatics.

[29]  M. Congreve,et al.  A 'rule of three' for fragment-based lead discovery? , 2003, Drug discovery today.

[30]  Egon L. Willighagen,et al.  Bioclipse 2: A scriptable integration platform for the life sciences , 2009, BMC Bioinformatics.

[31]  Katrin Stierand,et al.  From Structure Diagrams to Visual Chemical Patterns , 2010, J. Chem. Inf. Model..

[32]  Alban Arrault,et al.  Managing, profiling and analyzing a library of 2.6 million compounds gathered from 32 chemical providers , 2006, Molecular Diversity.

[33]  G. Rishton Nonleadlikeness and leadlikeness in biochemical screening. , 2003, Drug discovery today.

[34]  S D Pickett,et al.  Design of a compound screening collection for use in high throughput screening. , 2004, Combinatorial chemistry & high throughput screening.

[35]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[36]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[37]  Worth Andrew,et al.  Open Source Tools For Read Across And Category Formation , 2010 .

[38]  Robert P. Sheridan,et al.  Reagent Selector: Using Synthon Analysis to Visualize Reagent Properties and Assist in Combinatorial Library Design. , 2005 .

[39]  Stefan Wetzel,et al.  Interactive exploration of chemical space with Scaffold Hunter. , 2009, Nature chemical biology.

[40]  D. Bojanic,et al.  Impact of high-throughput screening in biomedical research , 2011, Nature Reviews Drug Discovery.

[41]  Willem P. van Hoorn,et al.  Designing Compound Subsets: Comparison of Random and Rational Approaches Using Statistical Simulation , 2007, J. Chem. Inf. Model..

[42]  Tudor I. Oprea,et al.  Is There a Difference Between Leads and Drugs? A Historical Perspective. , 2001 .