Cocrystal Prediction Using Machine Learning Models and Descriptors

Cocrystals are of much interest in industrial application as well as academic research, and screening of suitable coformers for active pharmaceutical ingredients is the most crucial and challenging step in cocrystal development. Recently, machine learning techniques are attracting researchers in many fields including pharmaceutical research such as quantitative structure-activity/property relationship. In this paper, we develop machine learning models to predict cocrystal formation. We extract descriptor values from simplified molecular-input line-entry system (SMILES) of compounds and compare the machine learning models by experiments with our collected data of 1476 instances. As a result, we found that artificial neural network shows great potential as it has the best accuracy, sensitivity, and F1 score. We also found that the model achieved comparable performance with about half of the descriptors chosen by feature selection algorithms. We believe that this will contribute to faster and more accurate cocrystal development.

[1]  Michael J. Zaworotko,et al.  Crystal engineering of the composition of pharmaceutical phases. Do pharmaceutical co-crystals represent a new path to improved medicines? , 2004 .

[2]  Gautam R. Desiraju,et al.  Supramolecular Synthons in Crystal Engineering—A New Organic Synthesis , 1995 .

[3]  Sarah L Price,et al.  Can the Formation of Pharmaceutical Cocrystals Be Computationally Predicted? 2. Crystal Structure Prediction. , 2009, Journal of chemical theory and computation.

[4]  G. Day,et al.  Towards prediction of stoichiometry in crystalline multicomponent complexes. , 2008, Chemistry.

[5]  Tatsuya Takagi,et al.  Mordred: a molecular descriptor calculator , 2018, Journal of Cheminformatics.

[6]  Manuela Pavan,et al.  DRAGON SOFTWARE: AN EASY APPROACH TO MOLECULAR DESCRIPTOR CALCULATIONS , 2006 .

[7]  F. Affouard,et al.  Affinity prediction computations and mechanosynthesis of carbamazepine based cocrystals , 2019, CrystEngComm.

[8]  Ah Reum Kang,et al.  Malware Detection of Hangul Word Processor Files Using Spatial Pyramid Average Pooling , 2020, Sensors.

[9]  E. Vlieg,et al.  Cocrystals in the Cambridge Structural Database: a network approach. , 2019, Acta crystallographica Section B, Structural science, crystal engineering and materials.

[10]  C. Hunter,et al.  H-bond competition experiments in solution and the solid state , 2016 .

[11]  Yu-hsin Tsai Quantifying Urban Form: Compactness versus 'Sprawl' , 2005 .

[12]  Jan-Joris Devogelaer,et al.  Co‐crystal Prediction by Artificial Neural Networks** , 2020, Angewandte Chemie.

[13]  Xiaomin Luo,et al.  Machine-Learning-Guided Cocrystal Prediction Based on Large Data Base , 2020 .

[14]  László Fábián,et al.  Cambridge Structural Database Analysis of Molecular Complementarity in Cocrystals , 2009 .

[15]  A. Newman,et al.  Pharmaceutical Cocrystals and Their Physicochemical Properties , 2009, Crystal growth & design.

[16]  Aurora J. Cruz-Cabeza,et al.  Acid–base crystalline complexes and the pKa rule , 2012 .

[17]  J. Steed,et al.  Pharmaceutical cocrystals, salts and multicomponent systems; intermolecular interactions and property based design☆ , 2017, Advanced drug delivery reviews.

[18]  P. Popelier,et al.  New insights in atom-atom interactions for future drug design. , 2012, Current topics in medicinal chemistry.

[19]  S. Nagy,et al.  Reliability of the Hansen solubility parameters as co‐crystal formation prediction tool , 2019, International journal of pharmaceutics.

[20]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[21]  E. Pindelska,et al.  Pharmaceutical Cocrystals as an Opportunity to Modify Drug Properties: From the Idea to Application: A Review. , 2017, Current pharmaceutical design.

[22]  Dennis Douroumis,et al.  Advanced methodologies for cocrystal synthesis , 2017, Advanced drug delivery reviews.

[23]  Kelsey L. Savig,et al.  Co-crystals and molecular salts of carboxylic acid/pyridine complexes: can calculated pKa's predict proton transfer? A case study of nine complexes , 2015 .

[24]  Laurent Joubert,et al.  Convergence of the electrostatic interaction based on topological atoms , 2001 .

[25]  Christopher A. Hunter,et al.  Virtual cocrystal screening , 2011 .

[26]  G. Day,et al.  Evaluating the Energetic Driving Force for Cocrystal Formation , 2017, Crystal growth & design.

[27]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[28]  P. Karamertzanis,et al.  Can the Formation of Pharmaceutical Cocrystals Be Computationally Predicted? I. Comparison of Lattice Energies , 2009 .

[29]  Dong-Sheng Cao,et al.  PyDPI: Freely Available Python Package for Chemoinformatics, Bioinformatics, and Chemogenomics Studies , 2013, J. Chem. Inf. Model..

[30]  Noel M. O'Boyle,et al.  Cinfony – combining Open Source cheminformatics toolkits behind a common interface , 2008, Chemistry Central journal.

[31]  R. Chadha,et al.  Novel cocrystals of gliclazide: characterization and evaluation , 2016 .

[32]  Yan Su,et al.  Deep learning for in vitro prediction of pharmaceutical formulations , 2018, Acta pharmaceutica Sinica. B.

[33]  Christer B. Aakeröy,et al.  Building co-crystals with molecular sense and supramolecular sensibility , 2005 .

[34]  Eleanor J. Gardiner,et al.  Validation of a Computational Cocrystal Prediction Tool: Comparison of Virtual and Experimental Cocrystal Screening Results , 2014 .

[35]  Gargi Mukherjee,et al.  Polymorphs, Salts, and Cocrystals: What’s in a Name? , 2012 .

[36]  Å. Rasmuson,et al.  Prediction of solid state properties of co-crystals using artificial neural network modelling , 2018 .

[37]  A. Nangia,et al.  Pharmaceutical cocrystals: walking the talk. , 2016, Chemical communications.

[38]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[39]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[40]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[41]  G. Schneider,et al.  Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. , 2019, Chemical reviews.

[42]  Ning Shan,et al.  The role of cocrystals in pharmaceutical science. , 2008, Drug discovery today.

[43]  Shiquan Sun,et al.  Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies , 2020, Nature Methods.

[44]  Andreas Klamt Solvent-screening and co-crystal screening for drug development with COSMO-RS , 2012, Journal of Cheminformatics.

[45]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[46]  Jerome G. P. Wicker,et al.  Will they co-crystallize? , 2017 .

[47]  Robin Taylor,et al.  A Million Crystal Structures: The Whole Is Greater than the Sum of Its Parts. , 2019, Chemical reviews.

[48]  Colin R. Groom,et al.  Knowledge-based approaches to co-crystal design , 2014 .

[49]  P. Cysewski,et al.  Distinguishing Cocrystals from Simple Eutectic Mixtures: Phenolic Acids as Potential Pharmaceutical Coformers , 2018 .

[50]  A. Klamt,et al.  Cocrystal Ternary Phase Diagrams from Density Functional Theory and Solvation Thermodynamics , 2018, Crystal Growth & Design.

[51]  P. Cysewski,et al.  Application of Multivariate Adaptive Regression Splines (MARSplines) Methodology for Screening of Dicarboxylic Acid Cocrystal Using 1D and 2D Molecular Descriptors , 2019, Crystal Growth & Design.

[52]  C. Hunter,et al.  Quantifying intermolecular interactions: guidelines for the molecular recognition toolbox. , 2004, Angewandte Chemie.

[53]  M. Zaworotko,et al.  Pharmaceutical cocrystals: along the path to improved medicines. , 2016, Chemical communications.

[54]  G. Walker,et al.  Pharmaceutical cocrystals: from serendipity to design to application. , 2019, Drug discovery today.

[55]  S. Velaga,et al.  Hansen solubility parameter as a tool to predict cocrystal formation. , 2011, International journal of pharmaceutics.

[56]  G. Walker,et al.  Pharmaceutical Cocrystal Drug Products: An Outlook on Product Development. , 2018, Trends in pharmacological sciences.

[57]  Peter T. A. Galek,et al.  Knowledge-based H-bond prediction to aid experimental polymorph screening , 2009 .

[58]  P. Cysewski,et al.  Selection of effective cocrystals former for dissolution rate improvement of active pharmaceutical ingredients based on lipoaffinity index , 2017, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[59]  H. Brittain Pharmaceutical cocrystals: the coming wave of new drug substances. , 2013, Journal of pharmaceutical sciences.

[60]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[61]  W. Tong,et al.  Impact of solid state properties on developability assessment of drug candidates. , 2004, Advanced drug delivery reviews.