Synthetic accessibility assessment using auxiliary responses

Abstract Despite the recent advances in computational approaches to discovering new chemical compounds, accessibility assessment of designed compounds has still been a difficult task to automate because it is a heavily knowledge intensive task. A promising solution to such “AI-hard” tasks is collective intelligence approaches that aggregate opinions of a group of human non-experts or semi-experts. However, the existing aggregation methods rely only on synthetic accessibility evaluation scores given by humans, and they do not exploit auxiliary information obtained as byproducts of human evaluations such as that related to chemical structures. In this paper, we propose to exploit such auxiliary responses to obtain better aggregations. We introduce a new two-stage aggregation method of semi-expert judgments consisting of synthetic accessibility evaluation scores along with auxiliary responses that select substructures of targets obstructive to their synthesis. The first stage divides both semi-experts and substructures into clusters using stochastic block models to identify similar skills or properties. The second stage aggregates judgments while considering groups of semi-experts and substructures, and predicts synthetic accessibility. Our experiments show that the use of auxiliary responses improves the prediction performance and gives insight into evaluators and the structure of evaluated compounds.

[1]  Silvia Miksch,et al.  To Score or Not to Score? Tripling Insights for Participatory Design , 2009, IEEE Computer Graphics and Applications.

[2]  Philip N. Judson,et al.  Starting material oriented retrosynthetic analysis in the LHASA program. 1. General description , 1992, J. Chem. Inf. Comput. Sci..

[3]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[4]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[5]  Christian List,et al.  The theory of judgment aggregation: an introductory review , 2012, Synthese.

[6]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[7]  Meir Glick,et al.  Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery , 2012, PloS one.

[8]  Paul Kanyuk Brain Springs: Fast Physics for Large Crowds in WALL•E , 2009, IEEE Computer Graphics and Applications.

[9]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.

[10]  Johann Gasteiger,et al.  Structure and reaction based evaluation of synthetic accessibility , 2007, J. Comput. Aided Mol. Des..

[11]  Tudor I. Oprea,et al.  A crowdsourcing evaluation of the NIH chemical probes. , 2009, Nature chemical biology.

[12]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[13]  M. Sitzmann,et al.  Computer‐Assisted Synthesis Design by WODCA (CASD) , 2008 .

[14]  Michael S Lajiness,et al.  Assessment of the consistency of medicinal chemists in reviewing sets of compounds. , 2004, Journal of medicinal chemistry.

[15]  Adriana Kovashka,et al.  Discovering Attribute Shades of Meaning with the Crowd , 2014, International Journal of Computer Vision.

[16]  Jeffrey J. Sutherland,et al.  Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships , 2003, J. Chem. Inf. Comput. Sci..

[17]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[18]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[19]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[20]  Hisashi Kashima,et al.  Wisdom of crowds for synthetic accessibility evaluation. , 2018, Journal of molecular graphics & modelling.

[21]  Hisashi Kashima,et al.  Clustering Crowds , 2013, AAAI.

[22]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[23]  George Karypis,et al.  Assessing Synthetic Accessibility of Chemical Compounds Using Machine Learning Methods , 2010, J. Chem. Inf. Model..

[24]  Jure Leskovec,et al.  A Bayesian Framework for Modeling Human Evaluations , 2015, SDM.

[25]  Haruki Nakamura,et al.  Prediction of Synthetic Accessibility Based on Commercially Available Compound Databases , 2014, J. Chem. Inf. Model..

[26]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[27]  Valerie J. Gillet,et al.  SPROUT, HIPPO and CAESA: Tools for de novo structure generation and estimation of synthetic accessibility , 1995 .

[28]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.