Wisdom of crowds for synthetic accessibility evaluation.

Synthetic accessibility evaluation is a process to assess the ease of synthesis of compounds. A rapid method for the assessment of synthetic accessibility for a vast number of chemical compounds is expected to bring about a breakthrough in the drug discovery. Although several computational methods have been proposed, the compound evaluation has still been processed by medicinal chemists; however, the low throughput of the human evaluation due to the lack of chemists is a critical issue for handling a large number of compounds. We propose the use of crowdsourcing for addressing this problem, and we conducted experiments to investigate the feasibility of incorporating semi-experts and a statistical aggregation method into the synthetic accessibility evaluation. Our experimental results show that we can obtain accurate synthetic accessibility scores through the statistical aggregation of judgments from semi-experts.

[1]  George Karypis,et al.  Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods , 2010, J. Chem. Inf. Model..

[2]  Philip N. Judson,et al.  Starting material oriented retrosynthetic analysis in the LHASA program. 1. General description , 1992, J. Chem. Inf. Comput. Sci..

[3]  Daisuke Kawahara,et al.  Rapid Development of a Corpus with Discourse Annotations using Two-stage Crowdsourcing , 2014, COLING.

[4]  Lin-Li Li,et al.  RASA: A Rapid Retrosynthesis-Based Scoring Method for the Assessment of Synthetic Accessibility of Drug-like Molecules , 2011, J. Chem. Inf. Model..

[5]  Haruki Nakamura,et al.  Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases , 2014, J. Chem. Inf. Model..

[6]  M. Sitzmann,et al.  Computer‐Assisted Synthesis Design by WODCA (CASD) , 2008 .

[7]  J. Baell,et al.  New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. , 2010, Journal of medicinal chemistry.

[8]  Meir Glick,et al.  Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery , 2012, PloS one.

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[11]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[12]  Tudor I. Oprea,et al.  A crowdsourcing evaluation of the NIH chemical probes. , 2009, Nature chemical biology.

[13]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[14]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[15]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.

[16]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[17]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[20]  Raman Parkesh,et al.  Structural enrichment of HTS compounds from available commercial libraries , 2012 .

[21]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[22]  Olivier Sperandio,et al.  FAF-Drugs2: Free ADME/tox filtering tool to assist drug discovery and chemical biology projects , 2008, BMC Bioinformatics.

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .