A Bayesian Approach to Calibrating High-Throughput Virtual Screening Results and Application to Organic Photovoltaic Materials

A novel approach for calibrating quantum-chemical properties determined as part of a high-throughput virtual screen to experimental analogs is presented. Information on the molecular graph is extracted through the use of extended connectivity fingerprints, and exploited using a Gaussian process to calibrate both electronic properties such as frontier orbital energies, and optical gaps and device properties such as short circuit current density, open circuit voltage and power conversion efficiency. The Bayesian nature of this process affords a value for uncertainty in addition to each calibrated value. This allows the researcher to gain intuition about the model as well as the ability to respect its bounds.

[1]  Frank R. Burden,et al.  Quantitative Structure-Activity Relationship Studies Using Gaussian Processes , 2001, J. Chem. Inf. Comput. Sci..

[2]  Benjamin G. Levine,et al.  Simulated evolution of fluorophores for light emitting diodes. , 2015, The Journal of chemical physics.

[3]  Brian K. Shoichet,et al.  Virtual screening of chemical libraries , 2004, Nature.

[4]  Mathew D. Halls,et al.  Virtual screening of electron acceptor materials for organic photovoltaic applications , 2013 .

[5]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.

[6]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[7]  Michael P. Marshak,et al.  A metal-free organic–inorganic aqueous flow battery , 2014, Nature.

[8]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[9]  Maria J. Ramos,et al.  Successes and failures of DFT functionals in acid/base and redox reactions of organic and biochemical interest , 2011 .

[10]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[11]  A. Packard,et al.  Interval prediction of molecular properties in parametrized quantum chemistry. , 2014, Physical review letters.

[12]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[13]  Tom K. Woo,et al.  Fast and accurate electrostatics in metal organic frameworks with a robust charge equilibration parameterization for high-throughput virtual screening of gas adsorption , 2013 .

[14]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[15]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[16]  Yan Zhao,et al.  Density Functionals for Noncovalent Interaction Energies of Biological Importance. , 2007, Journal of chemical theory and computation.

[17]  Tanja Van Mourik,et al.  Uniting Cheminformatics and Chemical Theory To Predict the Intrinsic Aqueous Solubility of Crystalline Druglike Molecules , 2014, J. Chem. Inf. Model..

[18]  J. Perdew,et al.  Density-functional approximation for the correlation energy of the inhomogeneous electron gas. , 1986, Physical review. B, Condensed matter.

[19]  A. Becke,et al.  Density-functional exchange-energy approximation with correct asymptotic behavior. , 1988, Physical review. A, General physics.

[20]  Burke,et al.  Generalized Gradient Approximation Made Simple. , 1996, Physical review letters.

[21]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[22]  K. Burke,et al.  Rationale for mixing exact exchange with density functional approximations , 1996 .

[23]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[24]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[25]  Martin Korth,et al.  Large-scale virtual high-throughput screening for the identification of new battery electrolyte solvents: evaluation of electronic structure theory methods. , 2014, Physical chemistry chemical physics : PCCP.

[26]  Noel M. O'Boyle,et al.  Computational Design and Selection of Optimal Organic Photovoltaic Materials , 2011 .

[27]  F. Weigend,et al.  Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. , 2005, Physical chemistry chemical physics : PCCP.

[28]  Christoph J. Brabec,et al.  Design Rules for Donors in Bulk‐Heterojunction Solar Cells—Towards 10 % Energy‐Conversion Efficiency , 2006 .

[29]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[30]  D. Truhlar,et al.  The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals , 2008 .

[31]  G. Hutchison,et al.  Efficient Computational Screening of Organic Polymer Photovoltaics. , 2013, The journal of physical chemistry letters.

[32]  C. Wilmer,et al.  Large-scale screening of hypothetical metal-organic frameworks. , 2012, Nature chemistry.

[33]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[34]  D. Truhlar,et al.  How Well Can Modern Density Functionals Predict Internuclear Distances at Transition States? , 2011, Journal of chemical theory and computation.

[35]  Klaus-Robert Müller,et al.  Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach , 2007, J. Chem. Inf. Model..

[36]  Alán Aspuru-Guzik,et al.  What Is High-Throughput Virtual Screening? A Perspective from Organic Materials Discovery , 2015 .

[37]  Kimihiko Hirao,et al.  An examination of density functional theories on isomerization energy calculations of organic molecules , 2011 .

[38]  John B. O. Mitchell,et al.  Can we predict lattice energy from molecular structure? , 2003, Acta Crystallographica Section B Structural Science.

[39]  Alán Aspuru-Guzik,et al.  Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry – the Harvard Clean Energy Project , 2014 .

[40]  Weitao Yang,et al.  Challenges for density functional theory. , 2012, Chemical reviews.

[41]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[42]  Randall Q. Snurr,et al.  High-Throughput Screening of Porous Crystalline Materials for Hydrogen Storage Capacity near Room Temperature , 2014 .

[43]  L. Kronik,et al.  Orbital-dependent density functionals: Theory and applications , 2008 .

[44]  Mathew D. Halls,et al.  High-throughput quantum chemistry and virtual screening for OLED material components , 2013, Optics & Photonics - Photonic Devices + Applications.

[45]  Mathew D. Halls,et al.  High-throughput quantum chemistry and virtual screening for lithium ion battery electrolyte additives , 2010 .