QSAR without borders.

Prediction of chemical bioactivity and physical properties has been one of the most important applications of statistical and more recently, machine learning and artificial intelligence methods in chemical sciences. This field of research, broadly known as quantitative structure-activity relationships (QSAR) modeling, has developed many important algorithms and has found a broad range of applications in physical organic and medicinal chemistry in the past 55+ years. This Perspective summarizes recent technological advances in QSAR modeling but it also highlights the applicability of algorithms, modeling methods, and validation practices developed in QSAR to a wide range of research areas outside of traditional QSAR boundaries including synthesis planning, nanotechnology, materials science, biomaterials, and clinical informatics. As modern research methods generate rapidly increasing amounts of data, the knowledge of robust data-driven modelling methods professed within the QSAR field can become essential for scientists working both within and outside of chemical research. We hope that this contribution highlighting the generalizable components of QSAR modeling will serve to address this challenge.

[1]  G. Madsen,et al.  Automated search for new thermoelectric materials: the case of LiZnSb. , 2006, Journal of the American Chemical Society.

[2]  Artem Cherkasov,et al.  DeepCOP: deep learning-based approach to predict gene regulating effects of small molecules , 2019, Bioinform..

[3]  Graham W. Taylor,et al.  Prediction of flow duration curves for ungauged basins , 2017 .

[4]  Ola Engkvist,et al.  Computational prediction of chemical reactions: current status and outlook. , 2018, Drug discovery today.

[5]  C. Hansch Quantitative structure-activity relationships and the unnamed science , 1993 .

[6]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[7]  William H. Green,et al.  Using Machine Learning To Predict Suitable Conditions for Organic Reactions , 2018, ACS central science.

[8]  Connor W. Coley,et al.  SCScore: Synthetic Complexity Learned from a Reaction Corpus , 2018, J. Chem. Inf. Model..

[9]  C M Fletcher,et al.  The emphysematous and bronchial types of chronic airways obstruction. A clinicopathological study of patients in London and Chicago. , 1966, Lancet.

[10]  Manoj K. Kesharwani,et al.  The S66x8 benchmark for noncovalent interactions revisited: explicitly correlated ab initio methods and density functional theory. , 2016, Physical chemistry chemical physics : PCCP.

[11]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[12]  David B. Searls,et al.  Can literature analysis identify innovation drivers in drug discovery? , 2009, Nature Reviews Drug Discovery.

[13]  Miguel Yurrita,et al.  Performance assessment of existing models to predict brittle failure modes of steel-to-timber connections loaded parallel-to-grain with dowel-type fasteners , 2018, Engineering Structures.

[14]  James L. McGrath,et al.  The influence of protein adsorption on nanoparticle association with cultured endothelial cells. , 2009, Biomaterials.

[15]  Doyle Knight,et al.  QSAR Models for the Analysis of Bioresponse Data from Combinatorial Libraries of Biomaterials , 2005 .

[16]  M. Widom Modeling the structure and thermodynamics of high-entropy alloys , 2018, Journal of Materials Research.

[17]  Rajarshi Guha,et al.  Pharos: Collating protein information to shed light on the druggable genome , 2016, Nucleic Acids Res..

[18]  Ying Mei,et al.  Modelling human embryoid body cell adhesion to a combinatorial library of polymer surfaces. , 2012, Journal of materials chemistry.

[19]  Gilles Marcou,et al.  Computational chemogenomics: Is it more than inductive transfer? , 2014, Journal of Computer-Aided Molecular Design.

[20]  Gábor Csányi,et al.  Comparing molecules and solids across structural and alchemical space. , 2015, Physical chemistry chemical physics : PCCP.

[21]  Alexander Tropsha,et al.  Cheminformatics analysis of assertions mined from literature that describe drug-induced liver injury in different species. , 2010, Chemical research in toxicology.

[22]  Alexandre Varnek,et al.  Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures , 2005, J. Comput. Aided Mol. Des..

[23]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[24]  O. A. von Lilienfeld,et al.  Communication: Understanding molecular representations in machine learning: The role of uniqueness and target similarity. , 2016, The Journal of chemical physics.

[25]  Wei Chen,et al.  A comparative study on groundwater spring potential analysis based on statistical index, index of entropy and certainty factors models , 2018 .

[26]  Edward C Sherer,et al.  Toward structure-based predictive tools for the selection of chiral stationary phases for the chromatographic separation of enantiomers. , 2016, Journal of chromatography. A.

[27]  Jürgen Bajorath,et al.  Computational Method for the Systematic Identification of Analog Series and Key Compounds Representing Series and Their Biological Activity Profiles. , 2016, Journal of medicinal chemistry.

[28]  Jerzy Leszczynski,et al.  Using nano-QSAR to predict the cytotoxicity of metal oxide nanoparticles. , 2011, Nature nanotechnology.

[29]  Cormac Toher,et al.  Spectral descriptors for bulk metallic glasses based on the thermodynamics of competing crystalline phases , 2016, Nature Communications.

[30]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[31]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[32]  Maurice Whelan,et al.  How Adverse Outcome Pathways Can Aid the Development and Use of Computational Prediction Models for Regulatory Toxicology , 2016, Toxicological sciences : an official journal of the Society of Toxicology.

[33]  A. Nel,et al.  Classification NanoSAR development for cytotoxicity of metal oxide nanoparticles. , 2011, Small.

[34]  Igor V Tetko,et al.  Using Online Tool (iPrior) for Modeling ToxCast™ Assays Towards Prioritization of Animal Toxicity Testing. , 2015, Combinatorial chemistry & high throughput screening.

[35]  Robert Langer,et al.  High throughput methods applied in biomaterial development and discovery. , 2010, Biomaterials.

[36]  Morgan R. Alexander,et al.  Toward Interpretable Machine Learning Models for Materials Discovery , 2019, Adv. Intell. Syst..

[37]  M. E. Garcia Denegri,et al.  Venoms and isolated toxins from snakes of medical impact in the Northeast Argentina: State of the art. Potential pharmacological applications. , 2019, Current topics in medicinal chemistry.

[38]  A. Tropsha,et al.  Quantitative nanostructure-activity relationship modeling. , 2010, ACS nano.

[39]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[40]  Guanyu Wang,et al.  Machine Learning Based Toxicity Prediction: From Chemical Structural Description to Transcriptome Analysis , 2018, International journal of molecular sciences.

[41]  J. Bajorath,et al.  SAR index: quantifying the nature of structure-activity relationships. , 2007, Journal of medicinal chemistry.

[42]  Alexander Tropsha,et al.  Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance , 2017, American Journal of Pharmaceutical Education.

[43]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[44]  Petra Helmholz,et al.  A novel protocol for assessment of aboveground biomass in rangeland environments , 2015 .

[45]  Nicolas Lachiche,et al.  A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water , 2011, Int. J. Artif. Intell. Tools.

[46]  Artem Cherkasov,et al.  The use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction networks , 2008, BMC Systems Biology.

[47]  Gerard J. P. van Westen,et al.  Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets , 2011 .

[48]  Peter Ertl,et al.  Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions , 2009, J. Cheminformatics.

[49]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[50]  Johann Gasteiger,et al.  Molecular descriptor data explain market prices of a large commercial chemical compound library , 2016, Scientific reports.

[51]  Dimitris K. Agrafiotis,et al.  A Novel Method for Building Regression Tree Models for QSAR Based on Artificial Ant Colony Systems , 2001, J. Chem. Inf. Comput. Sci..

[52]  Alexandre Varnek,et al.  Structure–reactivity modeling using mixture-based representation of chemical reactions , 2017, Journal of Computer-Aided Molecular Design.

[53]  Fernanda Borges,et al.  QSAR and Complex Network Recognition of miRNAs in Stem Cells , 2013 .

[54]  David A. Winkler,et al.  Multivariate analysis of ToF‐SIMS data using mass segmented peak lists , 2018 .

[55]  D. Agrafiotis,et al.  Combinatorial informatics in the post-genomics era , 2002, Nature Reviews Drug Discovery.

[56]  Robert P Sheridan,et al.  Interpretation of QSAR Models by Coloring Atoms According to Changes in Predicted Activity: How Robust Is It? , 2019, J. Chem. Inf. Model..

[57]  S. M. Hamze-Ziabari,et al.  Predicting Bond Strength between FRP Plates and Concrete Sub-strate: Applications of GMDH and MNLR Approaches , 2017 .

[58]  Tatiana Dizhbite,et al.  Characterization of Softwood and Hardwood LignoBoost Kraft Lignins with Emphasis on their Antioxidant Activity , 2014 .

[59]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[60]  Vinicius M. Alves,et al.  Alarms about structural alerts. , 2016, Green chemistry : an international journal and green chemistry resource : GC.

[61]  Miguel A. L. Marques,et al.  The optimal one dimensional periodic table: a modified Pettifor chemical scale from data mining , 2016 .

[62]  A. Schut,et al.  Improved wheat yield and production forecasting with a moisture stress index, AVHRR and MODIS data. , 2009 .

[63]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[64]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[65]  H. Kubinyi QSAR and 3D QSAR in drug design Part 1: methodology , 1997 .

[66]  Robert Langer,et al.  Combinatorial discovery of polymers resistant to bacterial attachment , 2012, Nature Biotechnology.

[67]  Madjid Tavana,et al.  A hybrid intelligent fuzzy predictive model with simulation for supplier evaluation and selection , 2016, Expert Syst. Appl..

[68]  Aron Walsh,et al.  Inorganic materials: The quest for new functionality. , 2015, Nature chemistry.

[69]  B. Grzybowski,et al.  Parallel optimization of synthetic pathways within the network of organic chemistry. , 2012, Angewandte Chemie.

[70]  Frank R. Burden,et al.  Relevance Vector Machines: Sparse Classification Methods for QSAR , 2015, J. Chem. Inf. Model..

[71]  John E Herr,et al.  The many-body expansion combined with neural networks. , 2016, The Journal of chemical physics.

[72]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[73]  Marc C. Nicklaus,et al.  QSAR Modeling and Prediction of Drug-Drug Interactions. , 2016, Molecular pharmaceutics.

[74]  A. Fugl-Meyer,et al.  The post-stroke hemiplegic patient. 1. a method for evaluation of physical performance. , 1975, Scandinavian journal of rehabilitation medicine.

[75]  Gisbert Schneider,et al.  Automating drug discovery , 2017, Nature Reviews Drug Discovery.

[76]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[77]  Jerzy Leszczynski,et al.  The effects of characteristics of substituents on toxicity of the nitroaromatics: HiT QSAR study , 2008, J. Comput. Aided Mol. Des..

[78]  Thomas Gärtner,et al.  Support-Vector-Machine-Based Ranking Significantly Improves the Effectiveness of Similarity Searching Using 2D Fingerprints and Multiple Reference Compounds , 2008, J. Chem. Inf. Model..

[79]  Elham Sadat Mostafavi,et al.  Gene expression programming as a basis for new generation of electricity demand prediction models , 2014, Comput. Ind. Eng..

[80]  D. Young,et al.  Are the Chemical Structures in Your QSAR Correct , 2008 .

[81]  Abish Malik,et al.  Factors influencing temporal patterns in crime in a large American city: A predictive analytics perspective , 2018, PloS one.

[82]  Ravi Iyengar,et al.  The Library of Integrated Network-Based Cellular Signatures NIH Program: System-Level Cataloging of Human Cells Response to Perturbations. , 2017, Cell systems.

[83]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[84]  John P. Overington,et al.  The promise and peril of chemical probes. , 2015, Nature chemical biology.

[85]  Anton Simeonov,et al.  Unexplored therapeutic opportunities in the human genome , 2018, Nature Reviews Drug Discovery.

[86]  Valerie J. Gillet,et al.  Knowledge-Based Approach to de Novo Design Using Reaction Vectors , 2009, J. Chem. Inf. Model..

[87]  K. Chou,et al.  Unified QSAR approach to antimicrobials. Part 3: first multi-tasking QSAR model for input-coded prediction, structural back-projection, and complex networks clustering of antiprotozoal compounds. , 2008, Bioorganic & medicinal chemistry.

[88]  Robert P. Sheridan,et al.  Using Random Forest To Model the Domain Applicability of Another Random Forest Model , 2013, J. Chem. Inf. Model..

[89]  Rajarshi Guha,et al.  Use of genetic algorithm and neural network approaches for risk factor selection: A case study of West Nile virus dynamics in an urban environment , 2010, Comput. Environ. Urban Syst..

[90]  Dimitris K. Agrafiotis,et al.  Self‐organizing superimposition algorithm for conformational sampling , 2007, J. Comput. Chem..

[91]  Alán Aspuru-Guzik,et al.  Chimera: enabling hierarchy based multi-objective optimization for self-driving laboratories , 2018, Chemical science.

[92]  Alán Aspuru-Guzik,et al.  ChemOS: Orchestrating autonomous experimentation , 2018, Science Robotics.

[93]  Isidro Cortes-Ciriano,et al.  Proteochemometric modelling coupled to in silico target prediction: an integrated approach for the simultaneous prediction of polypharmacology and binding affinity/potency of small molecules , 2015, Journal of Cheminformatics.

[94]  Tudor I. Oprea,et al.  Post-High-Throughput Screening Analysis: An Empirical Compound Prioritization Scheme , 2005, Journal of biomolecular screening.

[95]  Ruifeng Liu,et al.  Data-driven identification of structural alerts for mitigating the risk of drug-induced human liver injuries , 2015, Journal of Cheminformatics.

[96]  M Gastegger,et al.  wACSF-Weighted atom-centered symmetry functions as descriptors in machine learning potentials. , 2017, The Journal of chemical physics.

[97]  Tudor I. Oprea,et al.  Chronic obstructive pulmonary disease phenotypes using cluster analysis of electronic medical records , 2018, Health Informatics J..

[98]  Paul Shinn,et al.  Computer-Aided Discovery and Characterization of Novel Ebola Virus Inhibitors. , 2018, Journal of medicinal chemistry.

[99]  Gisbert Schneider,et al.  Deep Learning in Drug Discovery , 2016, Molecular informatics.

[100]  Igor V Tetko,et al.  Modelling the toxicity of a large set of metal and metal oxide nanoparticles using the OCHEM platform. , 2017, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[101]  Stefano Curtarolo,et al.  SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates , 2017, Physical Review Materials.

[102]  Rahul Rao,et al.  Autonomy in materials research: a case study in carbon nanotube growth , 2016 .

[103]  Pavel Polishchuk,et al.  Interpretation of Quantitative Structure-Activity Relationship Models: Past, Present, and Future , 2017, J. Chem. Inf. Model..

[104]  Ting Wang,et al.  Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling , 2005, J. Chem. Inf. Model..

[105]  Hermano Igo Krebs,et al.  Physically interactive robotic technology for neuromotor rehabilitation. , 2011, Progress in brain research.

[106]  Grant D. Huang,et al.  Robot-assisted therapy for long-term upper-limb impairment after stroke. , 2010, The New England journal of medicine.

[107]  Amir H. Gandomi,et al.  Gene expression programming approach to cost estimation formulation for utility projects , 2017 .

[108]  Akimichi Takemura,et al.  Statistical Modeling of Soil Moisture, Integrating Satellite Remote-Sensing (SAR) and Ground-Based Data , 2015, Remote. Sens..

[109]  Hangjun Chen,et al.  Development of a model for quality evaluation of litchi fruit , 2014 .

[110]  J. Gasteiger,et al.  Organic Reactions Classified by Neural Networks: Michael Additions, Friedel–Crafts Alkylations by Alkenes, and Related Reactions† , 1996 .

[111]  Mathias Wawer,et al.  Navigating structure-activity landscapes. , 2009, Drug discovery today.

[112]  Finbarr Murphy,et al.  Application of Bayesian networks for hazard ranking of nanomaterials to support human health risk assessment , 2017, Nanotoxicology.

[113]  Shuanghe Shen,et al.  Forecasting experiments of a dynamical–statistical model of the sea surface temperature anomaly field based on the improved self-memorization principle , 2017 .

[114]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[115]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[116]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[117]  C. Lipinski Lead- and drug-like compounds: the rule-of-five revolution. , 2004, Drug discovery today. Technologies.

[118]  Jürgen Bajorath,et al.  Rationalizing Three-Dimensional Activity Landscapes and the Influence of Molecular Representations on Landscape Topology and the Formation of Activity Cliffs , 2010, J. Chem. Inf. Model..

[119]  J. Gasteiger,et al.  Knowledge Discovery in Reaction Databases: Landscaping Organic Reactions by a Self-Organizing Neural Network , 1997 .

[120]  Xueting Zhang,et al.  Environmental factors influencing snowfall and snowfall prediction in the Tianshan Mountains, Northwest China , 2018, Journal of Arid Land.

[121]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[122]  I. Rusyn,et al.  Use of in Vitro HTS-Derived Concentration–Response Data as Biological Descriptors Improves the Accuracy of QSAR Models of in Vivo Toxicity , 2010, Environmental health perspectives.

[123]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[124]  Mohammad Najafzadeh,et al.  Prediction of riprap stone size under overtopping flow using data-driven models , 2018 .

[125]  Alexander Tropsha,et al.  Chemical toxicity prediction for major classes of industrial chemicals: Is it possible to develop universal models covering cosmetics, drugs, and pesticides? , 2017, Food and chemical toxicology : an international journal published for the British Industrial Biological Research Association.

[126]  Scott R Evans,et al.  Fundamentals of clinical trial design. , 2010, Journal of experimental stroke & translational medicine.

[127]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[128]  Jang Sik Choi,et al.  Toxicity Classification of Oxide Nanomaterials: Effects of Data Gap Filling and PChem Score-based Screening Approaches , 2018, Scientific Reports.

[129]  Christian Elsässer,et al.  Screening of rare-earth-lean intermetallic 1-11 and 1-11-X compounds of YNi9In2-type for hard-magnetic applications , 2017, Scripta Materialia.

[130]  Peter Willett,et al.  Similarity-based virtual screening using 2D fingerprints. , 2006, Drug discovery today.

[131]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[132]  Robert W. Taft,et al.  Polar and Steric Substituent Constants for Aliphatic and o-Benzoate Groups from Rates of Esterification and Hydrolysis of Esters1 , 1952 .

[133]  Alexander Tropsha,et al.  Curation of chemogenomics data. , 2015, Nature chemical biology.

[134]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[135]  R. M. Muir,et al.  Correlation of Biological Activity of Phenoxyacetic Acids with Hammett Substituent Constants and Partition Coefficients , 1962, Nature.

[136]  Igor I. Baskin,et al.  Development of “structure-property” models in nucleophilic substitution reactions involving azides , 2014, Journal of Structural Chemistry.

[137]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[138]  Masoud Jamei,et al.  Recent Advances in Development and Application of Physiologically-Based Pharmacokinetic (PBPK) Models: a Transition from Academic Curiosity to Regulatory Acceptance , 2016, Current Pharmacology Reports.

[139]  Logan T. Ward,et al.  A machine learning approach for engineering bulk metallic glass alloys , 2018, Acta Materialia.

[140]  Marco Pintore,et al.  SENSORY ANALYSIS OF RED WINES: DISCRIMINATION BY ADAPTIVE FUZZY PARTITION , 2008 .

[141]  R. Benzo,et al.  Chronic Obstructive Pulmonary Disease Phenotypes: Implications for Care , 2017, Mayo Clinic proceedings.

[142]  Robert Langer,et al.  Materials for stem cell factories of the future. , 2014, Nature materials.

[143]  Hyung-Gi Byun,et al.  Quasi-SMILES-Based Nano-Quantitative Structure-Activity Relationship Model to Predict the Cytotoxicity of Multiwalled Carbon Nanotubes to Human Lung Cells. , 2018, Chemical research in toxicology.

[144]  Anastasia V. Rudik,et al.  How to Achieve Better Results Using PASS-Based Virtual Screening: Case Study for Kinase Inhibitors , 2018, Front. Chem..

[145]  Alán Aspuru-Guzik,et al.  Next-Generation Experimentation with Self-Driving Laboratories , 2019, Trends in Chemistry.

[146]  J. Obbard,et al.  Whole cell‐catalyzed transesterification of waste vegetable oil , 2010 .

[147]  Antonio Lavecchia,et al.  Machine-learning approaches in drug discovery: methods and applications. , 2015, Drug discovery today.

[148]  Stephen W. Edwards,et al.  AOP‐DB: A database resource for the exploration of Adverse Outcome Pathways through integrated association networks , 2018, Toxicology and applied pharmacology.

[149]  Stefano Curtarolo,et al.  How the Chemical Composition Alone Can Predict Vibrational Free Energies and Entropies of Solids , 2017, 1703.02309.

[150]  Sergey Sosnin,et al.  Comparative Study of Multitask Toxicity Modeling on a Broad Chemical Space , 2018, J. Chem. Inf. Model..

[151]  Andreas Bender,et al.  Recognizing Pitfalls in Virtual Screening: A Critical Review , 2012, J. Chem. Inf. Model..

[152]  Aravind Subramanian,et al.  Perturbational profiling of nanomaterial biologic activity , 2008, Proceedings of the National Academy of Sciences.

[153]  S. Curtarolo,et al.  Nanograined Half‐Heusler Semiconductors as Advanced Thermoelectrics: An Ab Initio High‐Throughput Statistical Study , 2014, 1408.5859.

[154]  T. Lundstedt,et al.  Development of proteo-chemometrics: a novel technology for the analysis of drug-receptor interactions. , 2001, Biochimica et biophysica acta.

[155]  Da-Wen Sun,et al.  Regression Algorithms in Hyperspectral Data Analysis for Meat Quality Detection and Evaluation. , 2016, Comprehensive reviews in food science and food safety.

[156]  Cormac Toher,et al.  The search for high entropy alloys: A high-throughput ab-initio approach , 2017, Acta Materialia.

[157]  Alán Aspuru-Guzik,et al.  Closed-loop discovery platform integration is needed for artificial intelligence to make an impact in drug discovery , 2018, Expert opinion on drug discovery.

[158]  J. Dearden,et al.  How not to develop a quantitative structure–activity or structure–property relationship (QSAR/QSPR) , 2009, SAR and QSAR in environmental research.

[159]  Igor V. Tetko,et al.  Inductive Transfer of Knowledge: Application of Multi-Task Learning and Feature Net Approaches to Model Tissue-Air Partition Coefficients , 2009, J. Chem. Inf. Model..

[160]  Agnieszka Gajewicz,et al.  What if the number of nanotoxicity data is too small for developing predictive Nano-QSAR models? An alternative read-across based approach for filling data gaps. , 2017, Nanoscale.

[161]  Eugene N Muratov,et al.  Per aspera ad astra: application of Simplex QSAR approach in antiviral research. , 2010, Future medicinal chemistry.

[162]  Markus Hartenfeller,et al.  DOGS: Reaction-Driven de novo Design of Bioactive Compounds , 2012, PLoS Comput. Biol..

[163]  Adrian E. Roitberg,et al.  Less is more: sampling chemical space with active learning , 2018, The Journal of chemical physics.

[164]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[165]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[166]  Tudor I. Oprea,et al.  In silico toxicology protocols. , 2018, Regulatory toxicology and pharmacology : RTP.

[167]  Wei Chen,et al.  A Statistical Learning Framework for Materials Science: Application to Elastic Moduli of k-nary Inorganic Polycrystalline Compounds , 2016, Scientific Reports.

[168]  Johann Gasteiger,et al.  Modeling chemical reactions for drug design , 2007, J. Comput. Aided Mol. Des..

[169]  Andreas Verras,et al.  Is Multitask Deep Learning Practical for Pharma? , 2017, J. Chem. Inf. Model..

[170]  Jürgen Bajorath,et al.  MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs , 2012, J. Chem. Inf. Model..

[171]  Marc C. Nicklaus,et al.  QSAR Modeling Using Large-Scale Databases: Case Study for HIV-1 Reverse Transcriptase Inhibitors , 2015, J. Chem. Inf. Model..

[172]  John E. Herr,et al.  Intrinsic Bond Energies from a Bonds-in-Molecules Neural Network. , 2017, The journal of physical chemistry letters.

[173]  Stephan Ruhrmann,et al.  Restricted attention to social cues in schizophrenia patients , 2016, European Archives of Psychiatry and Clinical Neuroscience.

[174]  Christian Elsässer,et al.  Compositional optimization of hard-magnetic phases with machine-learning models , 2018, Acta Materialia.

[175]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[176]  P Wutzler,et al.  Identification of individual structural fragments of N,N'-(bis-5-nitropyrimidyl)dispirotripiperazine derivatives for cytotoxicity and antiherpetic activity allows the prediction of new highly active compounds. , 2007, The Journal of antimicrobial chemotherapy.

[177]  Igor L. Medintz,et al.  Meta-analysis of cellular toxicity for cadmium-containing quantum dots. , 2016, Nature nanotechnology.

[178]  Javier Martínez,et al.  On the use of machine learning techniques for the mechanical characterization of soft biological tissues , 2018, International journal for numerical methods in biomedical engineering.

[179]  Ola Engkvist,et al.  On the Integration of In Silico Drug Design Methods for Drug Repurposing , 2017, Front. Pharmacol..

[180]  Vladimir Poroikov,et al.  PASS: prediction of activity spectra for biologically active substances , 2000, Bioinform..

[181]  Nicolas H Voelcker,et al.  High-Throughput Assessment and Modeling of a Polymer Library Regulating Human Dental Pulp-Derived Stem Cell Behavior. , 2018, ACS applied materials & interfaces.

[182]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[183]  Artem Cherkasov,et al.  Best Practices of Computer-Aided Drug Discovery: Lessons Learned from the Development of a Preclinical Candidate for Prostate Cancer with a New Mechanism of Action , 2017, J. Chem. Inf. Model..

[184]  G. Pilania,et al.  Machine learning bandgaps of double perovskites , 2016, Scientific Reports.

[185]  Bahram Gharabaghi,et al.  Prediction of Timing of Watermain Failure Using Gene Expression Models , 2016, Water Resources Management.

[186]  Leonid Gorb,et al.  Application of Random Forest and Multiple Linear Regression Techniques to QSPR Prediction of an Aqueous Solubility for Military Compounds , 2010, Molecular informatics.

[187]  David A Winkler,et al.  Performance of Deep and Shallow Neural Networks, the Universal Approximation Theorem, Activity Cliffs, and QSAR , 2017, Molecular informatics.

[188]  Cormac Toher,et al.  Universal fragment descriptors for predicting properties of inorganic crystals , 2016, Nature Communications.

[189]  Piotr Dittwald,et al.  Computer-Assisted Synthetic Planning: The End of the Beginning. , 2016, Angewandte Chemie.

[190]  Tudor I. Oprea,et al.  Systems chemical biology. , 2007 .

[191]  Ola Spjuth,et al.  A Unified Proteochemometric Model for Prediction of Inhibition of Cytochrome P450 Isoforms , 2013, PloS one.

[192]  Matthew D. Segall Multi-Parameter Optimization: Identifying High Quality Compounds with a Balance of Properties , 2012 .

[193]  Joseph G. Shapter,et al.  Efficient Prediction of Structural and Electronic Properties of Hybrid 2D Materials Using Complementary DFT and Machine Learning Approaches , 2018, Advanced Theory and Simulations.

[194]  Gisbert Schneider,et al.  De Novo Design of Bioactive Small Molecules by Artificial Intelligence , 2018, Molecular informatics.

[195]  Noam Bernstein,et al.  Machine learning unifies the modeling of materials and molecules , 2017, Science Advances.

[196]  Sara Szymkuć,et al.  Chematica: A Story of Computer Code That Started to Think like a Chemist , 2018 .

[197]  Hubertus Murrenhoff,et al.  Testing and Prediction of Material Compatibility of Biofuel Candidates with Elastomeric Materials , 2015 .

[198]  Marco Buongiorno Nardelli,et al.  AFLUX: The LUX materials search API for the AFLOW data repositories , 2016, 1612.05130.

[199]  Ahmed M. A. Sattar,et al.  Gene expression models for prediction of dam breach parameters , 2014 .

[200]  F. Collins,et al.  Policy: NIH plans to enhance reproducibility , 2014, Nature.

[201]  Peter Murray-Rust,et al.  Minimum information about a bioactive entity (MIABE) , 2011, Nature Reviews Drug Discovery.

[202]  Sonia Arrasate,et al.  General theory for multiple input-output perturbations in complex molecular systems. 1. Linear QSPR electronegativity models in physical, organic, and medicinal chemistry. , 2013, Current topics in medicinal chemistry.

[203]  N. Hogan,et al.  Robot-Aided Neurorehabilitation: From Evidence-Based to Science-Based Rehabilitation , 2002, Topics in stroke rehabilitation.

[204]  Péter Englert,et al.  Efficient Heuristics for Maximum Common Substructure Search , 2015, J. Chem. Inf. Model..

[205]  Igor V Tetko,et al.  A renaissance of neural networks in drug discovery , 2016, Expert opinion on drug discovery.

[206]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[207]  Sachiyo Aburatani,et al.  Prediction of developmental chemical toxicity based on gene networks of human embryonic stem cells , 2016, Nucleic acids research.

[208]  Anubhav Jain,et al.  Data mined ionic substitutions for the discovery of new compounds. , 2011, Inorganic chemistry.

[209]  Kristin A. Persson,et al.  Predicting crystal structures with data mining of quantum calculations. , 2003, Physical review letters.

[210]  J. Rankin Cerebral Vascular Accidents in Patients over the Age of 60: II. Prognosis , 1957, Scottish medical journal.

[211]  Amir Hossein Alavi,et al.  New Ground-Motion Prediction Equations Using Multi Expression Programing , 2011 .

[212]  Alexander Tropsha,et al.  Cheminformatics-driven discovery of polymeric micelle formulations for poorly soluble drugs , 2019, Science Advances.

[213]  Pierre Baldi,et al.  Learning to Predict Chemical Reactions , 2011, J. Chem. Inf. Model..

[214]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[215]  Amir Hossein Alavi,et al.  A new approach for modeling of flow number of asphalt mixtures , 2017 .

[216]  Eugene N Muratov,et al.  Existing and Developing Approaches for QSAR Analysis of Mixtures , 2012, Molecular informatics.

[217]  Daniel L Villeneuve,et al.  Adverse outcome pathways: A conceptual framework to support ecotoxicology research and risk assessment , 2010, Environmental toxicology and chemistry.

[218]  Jürgen Bajorath,et al.  The ‘SAR Matrix’ method and its extensions for applications in medicinal chemistry and chemogenomics , 2014, F1000Research.

[219]  Ahmed M. A. Sattar,et al.  An entrainment model for non‐uniform sediment , 2015 .

[220]  G Marcou,et al.  QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids , 2012, Molecular informatics.

[221]  Charles S. Wortmann,et al.  Maize [Zea Mays (L.)] crop-nutrient response functions extrapolation for Sub-Saharan Africa , 2017, Nutrient Cycling in Agroecosystems.

[222]  Jean-Louis Reymond,et al.  Virtual exploration of the small-molecule chemical universe below 160 Daltons. , 2005, Angewandte Chemie.

[223]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[224]  Alexander Tropsha,et al.  Chemistry-Wide Association Studies (CWAS): A Novel Framework for Identifying and Interpreting Structure-Activity Relationships , 2018, J. Chem. Inf. Model..

[225]  Marwin H. S. Segler,et al.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction. , 2017, Chemistry.

[226]  Friedemann Pulvermüller,et al.  Brain mechanisms linking language and action , 2005, Nature Reviews Neuroscience.

[227]  Vladimir V Poroikov,et al.  In silico assessment of adverse drug reactions and associated mechanisms. , 2016, Drug discovery today.

[228]  Robert M T Madiona,et al.  Distinguishing Chemically Similar Polyamide Materials with ToF-SIMS Using Self-Organizing Maps and a Universal Data Matrix. , 2018, Analytical chemistry.

[229]  Alán Aspuru-Guzik,et al.  Accelerating the discovery of materials for clean energy in the era of smart automation , 2018, Nature Reviews Materials.

[230]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[231]  Alexander Tropsha,et al.  Best Practices for QSAR Model Development, Validation, and Exploitation , 2010, Molecular informatics.

[232]  Walter Cedeño,et al.  On the Use of Neural Network Ensembles in QSAR and QSPR , 2002, J. Chem. Inf. Comput. Sci..

[233]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[234]  Ralf B Schäfer,et al.  Evolutionary patterns and physicochemical properties explain macroinvertebrate sensitivity to heavy metals. , 2016, Ecological applications : a publication of the Ecological Society of America.

[235]  Robin Taylor,et al.  Simulation Analysis of Experimental Design Strategies for Screening Random Compounds as Potential New Drugs and Agrochemicals , 1995, J. Chem. Inf. Comput. Sci..

[236]  Frank R Burden,et al.  Sparse Feature Selection Identifies H2A.Z as a Novel Pattern-Specific Biomarker for Asymmetrically Self-Renewing Distributed Stem Cells , 2015, Microscopy and Microanalysis.

[237]  S. Ong,et al.  New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships , 2016 .

[238]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[239]  F. Smit,et al.  Development of a stage-dependent prognostic model to predict psychosis in ultra-high-risk patients seeking treatment for co-morbid psychiatric disorders , 2016, Psychological Medicine.

[240]  K. Müller,et al.  Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space , 2015, The journal of physical chemistry letters.

[241]  K. Lees,et al.  Seven-Day NIHSS Is a Sensitive Outcome Measure for Exploratory Clinical Trials in Acute Stroke: Evidence From the Virtual International Stroke Trials Archive , 2012, Stroke.

[242]  Georg Juckel,et al.  Expressed emotion as a predictor of the first psychotic episode — Results of the European prediction of psychosis study , 2018, Schizophrenia Research.

[243]  Cormac Toher,et al.  AFLOW-CHULL: Cloud-Oriented Platform for Autonomous Phase Stability Analysis , 2018, J. Chem. Inf. Model..

[244]  Frank R. Burden,et al.  Optimal Sparse Descriptor Selection for QSAR Using Bayesian Methods , 2009 .

[245]  Eugene N Muratov,et al.  Universal Approach for Structural Interpretation of QSAR/QSPR Models , 2013, Molecular informatics.

[246]  José L Medina-Franco,et al.  The many roles of molecular complexity in drug discovery. , 2017, Drug discovery today.

[247]  Hermano I Krebs,et al.  Robotic Measurement of Arm Movements After Stroke Establishes Biomarkers of Motor Recovery , 2014, Stroke.

[248]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[249]  Bowen Liu,et al.  Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models , 2017, ACS central science.

[250]  Takashi Taniguchi,et al.  Unconventional superconductivity in magic-angle graphene superlattices , 2018, Nature.

[251]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[252]  G. Maggiora,et al.  Molecular similarity in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[253]  Igor V Tetko,et al.  Identifying potential endocrine disruptors among industrial chemicals and their metabolites--development and evaluation of in silico tools. , 2015, Chemosphere.

[254]  Y. Martin,et al.  Do structurally similar molecules have similar biological activity? , 2002, Journal of medicinal chemistry.

[255]  Molly M. Stevens,et al.  Sparse feature selection methods identify unexpected global cellular response to strontium-containing materials , 2015, Proceedings of the National Academy of Sciences.

[256]  Stefano Curtarolo,et al.  Composition-spread Growth and the Robust Topological Surface State of Kondo insulator SmB6 Thin Films , 2014 .

[257]  K.,et al.  Reliability of measurements of muscle tone and muscle power in stroke patients. , 2000, Age and ageing.

[258]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[259]  Pengcheng Jiao,et al.  Next generation prediction model for daily solar radiation on horizontal surface using a hybrid neural network and simulated annealing method , 2017 .

[260]  Berzin Vm,et al.  Transfer RNA and aminoacyl-tRNA synthetases in cells of E. coli infected with phage MS2. , 1972 .

[261]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[262]  Schmid,et al.  "Scaffold-Hopping" by Topological Pharmacophore Search: A Contribution to Virtual Screening. , 1999, Angewandte Chemie.

[263]  Igor I. Baskin,et al.  Assessment of tautomer distribution using the condensed reaction graph approach , 2018, Journal of Computer-Aided Molecular Design.

[264]  S. Tofail,et al.  A Tractable Method for Measuring Nanomaterial Risk Using Bayesian Networks , 2016, Nanoscale Research Letters.

[265]  Sonia Arrasate,et al.  Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies , 2018, J. Chem. Inf. Model..

[266]  Alán Aspuru-Guzik,et al.  Phoenics: A Bayesian Optimizer for Chemistry , 2018, ACS central science.

[267]  Daniel S. Himmelstein,et al.  Heterogeneous Network Edge Prediction: A Data Integration Approach to Prioritize Disease-Associated Genes , 2014, bioRxiv.

[268]  Michael Gastegger,et al.  Machine learning molecular dynamics for the simulation of infrared spectra† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02267k , 2017, Chemical science.

[269]  Alexander Tropsha,et al.  Exploring quantitative nanostructure-activity relationships (QNAR) modeling as a tool for predicting biological effects of manufactured nanoparticles. , 2011, Combinatorial chemistry & high throughput screening.

[270]  Tomasz Puzyn,et al.  How should the completeness and quality of curated nanomaterial data be evaluated? , 2016, Nanoscale.

[271]  Wojciech Samek,et al.  Methods for interpreting and understanding deep neural networks , 2017, Digit. Signal Process..

[272]  Jürgen Bajorath,et al.  Lessons Learned from Molecular Scaffold Analysis , 2011, J. Chem. Inf. Model..

[273]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[274]  D. Winkler,et al.  Discovery and Optimization of Materials Using Evolutionary Approaches. , 2016, Chemical reviews.

[275]  Ehsan Sadrossadat,et al.  Indirect estimation of the ultimate bearing capacity of shallow foundations resting on rock masses , 2015 .

[276]  Adam C Mater,et al.  Deep Learning in Chemistry , 2019, J. Chem. Inf. Model..

[277]  Andy Liaw,et al.  Demystifying Multitask Deep Neural Networks for Quantitative Structure-Activity Relationships , 2017, J. Chem. Inf. Model..

[278]  Ruili Huang,et al.  CERAPP: Collaborative Estrogen Receptor Activity Prediction Project , 2016, Environmental health perspectives.

[279]  Christopher M Wolverton,et al.  High-Throughput Computational Screening of Perovskites for Thermochemical Water Splitting Applications , 2016 .

[280]  David A. Winkler,et al.  Understanding the Roles of the "Two QSARs" , 2016, J. Chem. Inf. Model..

[281]  Alexander Golbraikh,et al.  Data Set Modelability by QSAR , 2014, J. Chem. Inf. Model..

[282]  Stefano Curtarolo,et al.  Robust topological surface state in Kondo insulator SmB6 thin films , 2014 .

[283]  J. Maddox Crystals from first principles , 1988, Nature.

[284]  Michele Parrinello,et al.  Generalized neural-network representation of high-dimensional potential-energy surfaces. , 2007, Physical review letters.

[285]  George E. Dahl,et al.  Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error. , 2017, Journal of chemical theory and computation.

[286]  Christos A. Nicolaou,et al.  Ties in Proximity and Clustering Compounds , 2001, J. Chem. Inf. Comput. Sci..

[287]  B. Roth,et al.  Magic shotguns versus magic bullets: selectively non-selective drugs for mood disorders and schizophrenia , 2004, Nature Reviews Drug Discovery.

[288]  Lionel Canioni,et al.  Good practices in LIBS analysis: Review and advices , 2014 .

[289]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[290]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[291]  S. Curtarolo,et al.  Accelerated discovery of new magnets in the Heusler alloy family , 2017, Science Advances.

[292]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[293]  K. Reckhow,et al.  Validation and sensitivity of the FINE Bayesian network for forecasting aquatic exposure to nano-silver. , 2014, The Science of the total environment.

[294]  R. Kolodny,et al.  Sequence-similar, structure-dissimilar protein pairs in the PDB , 2007, Proteins.

[295]  Vladimir Poroikov,et al.  Multi-targeted natural products evaluation based on biological activity prediction with PASS. , 2010, Current pharmaceutical design.

[296]  E. Corey,et al.  Robert Robinson Lecture. Retrosynthetic thinking—essentials and examples , 1988 .

[297]  Igor V. Tetko,et al.  ToxCast EPA in Vitro to in Vivo Challenge: Insight into the Rank-I Model , 2016, Chemical research in toxicology.

[298]  Dragos Horvath,et al.  Expert System for Predicting Reaction Conditions: The Michael Reaction Case , 2015, J. Chem. Inf. Model..

[299]  Paola Gramatica,et al.  Introduction General Considerations , 2022 .

[300]  A. Tropsha,et al.  Computer-aided design of carbon nanotubes with the desired bioactivity and safety profiles , 2016, Nanotoxicology.

[301]  Lin He,et al.  DRAR-CPI: a server for identifying drug repositioning potential and adverse drug reactions via the chemical–protein interactome , 2011, Nucleic Acids Res..

[302]  T Scior,et al.  How to recognize and workaround pitfalls in QSAR studies: a critical review. , 2009, Current medicinal chemistry.

[303]  Alexander Golbraikh,et al.  Application of Quantitative Structure–Activity Relationship Models of 5-HT1A Receptor Binding to Virtual Screening Identifies Novel and Potent 5-HT1A Ligands , 2014, J. Chem. Inf. Model..

[304]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[305]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[306]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[307]  Alán Aspuru-Guzik,et al.  Neural Networks for the Prediction of Organic Chemistry Reactions , 2016, ACS central science.

[308]  Connor W. Coley,et al.  Machine Learning in Computer-Aided Synthesis Planning. , 2018, Accounts of chemical research.

[309]  D. Horvath,et al.  Predictive Models for Kinetic Parameters of Cycloaddition Reactions , 2018, Molecular informatics.

[310]  K. Müller,et al.  Fast and accurate modeling of molecular atomization energies with machine learning. , 2011, Physical review letters.

[311]  Dario Neri,et al.  DNA-Encoded Chemical Libraries: A Selection System Based on Endowing Organic Compounds with Amplifiable Information. , 2018, Annual review of biochemistry.

[312]  Corey Oses,et al.  Machine learning modeling of superconducting critical temperature , 2017, npj Computational Materials.

[313]  Robert J Kavlock,et al.  Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. , 2012, Toxicological sciences : an official journal of the Society of Toxicology.

[314]  Alexandre Varnek,et al.  Automatized Assessment of Protective Group Reactivity: A Step Toward Big Reaction Data Analysis , 2016, J. Chem. Inf. Model..