Machine learning property prediction for organic photovoltaic devices

Organic photovoltaic (OPV) materials are promising candidates for cheap, printable solar cells. However, there are a very large number of potential donors and acceptors, making selection of the best materials difficult. Here, we show that machine-learning approaches can leverage computationally expensive DFT calculations to estimate important OPV materials properties quickly and accurately. We generate quantitative relationships between simple and interpretable chemical signature and one-hot descriptors and OPV power conversion efficiency (PCE), open circuit potential ( V oc ), short circuit density ( J sc ), highest occupied molecular orbital (HOMO) energy, lowest unoccupied molecular orbital (LUMO) energy, and the HOMO–LUMO gap. The most robust and predictive models could predict PCE (computed by DFT) with a standard error of ±0.5 for percentage PCE for both the training and test set. This model is useful for pre-screening potential donor and acceptor materials for OPV applications, accelerating design of these devices for green energy applications.

[1]  Christoph J. Brabec,et al.  Design Rules for Donors in Bulk‐Heterojunction Solar Cells—Towards 10 % Energy‐Conversion Efficiency , 2006 .

[2]  Michael C. Heiber,et al.  Charge Generation and Recombination in an Organic Solar Cell with Low Energetic Offsets , 2018 .

[3]  Matthias Rupp,et al.  Machine learning for quantum mechanics in a nutshell , 2015 .

[4]  Frank R. Burden,et al.  New QSAR Methods Applied to Structure-Activity Mapping and Combinatorial Chemistry , 1999, J. Chem. Inf. Comput. Sci..

[5]  A. N. Solodukhin,et al.  Effects of electron-withdrawing group and electron-donating core combinations on physical properties and photovoltaic performance in D-π-A star-shaped small molecules , 2016 .

[6]  Alán Aspuru-Guzik,et al.  The Harvard organic photovoltaic dataset , 2016, Scientific Data.

[7]  F. Weigend,et al.  Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. , 2005, Physical chemistry chemical physics : PCCP.

[8]  Veljko Jankovi'c,et al.  Dynamics of exciton formation and relaxation in photoexcited semiconductors , 2015, 1510.04858.

[9]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[10]  Jean-Loup Faulon,et al.  The signature molecular descriptor. 3. Inverse-quantitative structure-activity relationship of ICAM-1 inhibitory peptides. , 2003, Journal of molecular graphics & modelling.

[11]  Nicolas H Voelcker,et al.  High-Throughput Assessment and Modeling of a Polymer Library Regulating Human Dental Pulp-Derived Stem Cell Behavior. , 2018, ACS applied materials & interfaces.

[12]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 2. Enumerating Molecules from Their Extended Valence Sequences , 2003, J. Chem. Inf. Comput. Sci..

[13]  David A Winkler,et al.  Performance of Deep and Shallow Neural Networks, the Universal Approximation Theorem, Activity Cliffs, and QSAR , 2017, Molecular informatics.

[14]  Nenad Trinajstic,et al.  Nonlinear Multivariate Regression Outperforms Several Concisely Designed Neural Networks on Three QSPR Data Sets , 2000, J. Chem. Inf. Comput. Sci..

[15]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[16]  Frank R. Burden,et al.  An Optimal Self‐Pruning Neural Network and Nonlinear Descriptor Selection in QSAR , 2009 .

[17]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[18]  Alessandro Troisi,et al.  Combining electronic and structural features in machine learning models to predict organic solar cells properties , 2019, Materials Horizons.

[19]  David A Winkler,et al.  Modelling Inhalational Anaesthetics Using Bayesian Feature Selection and QSAR Modelling Methods , 2010, ChemMedChem.

[20]  F. Burden,et al.  Robust QSAR models using Bayesian regularized neural networks. , 1999, Journal of medicinal chemistry.

[21]  A. Becke,et al.  Density-functional exchange-energy approximation with correct asymptotic behavior. , 1988, Physical review. A, General physics.

[22]  A. Troisi,et al.  Toward Predicting Efficiency of Organic Solar Cells via Machine Learning and Improved Descriptors , 2018, Advanced Energy Materials.

[23]  Jing Ma,et al.  A Time-Dependent DFT Study on Band Gaps and Effective Conjugation Lengths of Polyacetylene, Polyphenylene, Polypentafulvene, Polycyclopentadiene, Polypyrrole, Polyfuran, Polysilole, Polyphosphole, and Polythiophene , 2002 .

[24]  Henrik Boström,et al.  Trade-off between accuracy and interpretability for predictive in silico modeling. , 2011, Future medicinal chemistry.

[25]  S. Wold,et al.  Statistical Validation of QSAR Results , 1995 .

[26]  Sabre Kais,et al.  An Efficient Descriptor Model for Designing Materials for Solar Cells , 2015, 1706.01974.

[27]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Sviatlana V. Lamaka,et al.  In silico screening of modulators of magnesium dissolution , 2020, Corrosion Science.

[30]  Jean-Loup Faulon,et al.  The Signature Molecular Descriptor. 1. Using Extended Valence Sequences in QSAR and QSPR Studies , 2003, J. Chem. Inf. Comput. Sci..

[31]  Anna Köhler,et al.  Charge transport in organic semiconductors. , 2012, Topics in current chemistry.

[32]  Radford M. Neal Priors for Infinite Networks , 1996 .

[33]  Theoretical Analysis of Substituent Effects on Building Blocks of Conducting Polymers: 3,4'-Substituted Bithiophenes. , 1999, The Journal of organic chemistry.

[34]  Frank R. Burden,et al.  Bayesian neural nets for modeling in drug discovery , 2004 .

[35]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[36]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[37]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[38]  J. Topliss,et al.  Chance correlations in structure-activity studies using multiple regression analysis , 1972 .

[39]  P. Blom,et al.  Exciton diffusion in organic semiconductors , 2015 .

[40]  Li Li,et al.  Understanding Kernel Ridge Regression: Common behaviors from simple functions to density functionals , 2015, ArXiv.

[41]  Shawn Bourdo,et al.  Organic Solar Cells: A Review of Materials, Limitations, and Possibilities for Improvement , 2013 .

[42]  Morgan R. Alexander,et al.  Toward Interpretable Machine Learning Models for Materials Discovery , 2019, Adv. Intell. Syst..

[43]  Leo Guelman,et al.  Gradient boosting trees for auto insurance loss cost modeling and prediction , 2012, Expert Syst. Appl..

[44]  David A. Winkler,et al.  Understanding the Roles of the "Two QSARs" , 2016, J. Chem. Inf. Model..

[45]  F. Würthner,et al.  Modulation of band gap and p- versus n-semiconductor character of ADA dyes by core and acceptor group variation , 2016 .

[46]  Edward O. Pyzer-Knapp,et al.  A Bayesian Approach to Calibrating High-Throughput Virtual Screening Results and Application to Organic Photovoltaic Materials , 2015, 1510.00388.

[47]  Christopher M. Proctor,et al.  Charge carrier recombination in organic solar cells , 2013 .

[48]  Svetoslav H. Slavov,et al.  Quantitative Correlation of Physical and Chemical Properties with Chemical Structure: Utility for Prediction , 2011 .

[49]  Alán Aspuru-Guzik,et al.  Design Principles and Top Non-Fullerene Acceptor Candidates for Organic Photovoltaics , 2017 .

[50]  Michael Grätzel,et al.  First-Principles Modeling of Mixed Halide Organometal Perovskites for Photovoltaic Applications , 2013 .

[51]  Frank R. Burden,et al.  Robust QSAR Models from Novel Descriptors and Bayesian Regularised Neural Networks , 2000 .

[52]  Frank R. Burden,et al.  Optimal Sparse Descriptor Selection for QSAR Using Bayesian Methods , 2009 .

[53]  David A. Winkler,et al.  Beware of R2: Simple, Unambiguous Assessment of the Prediction Accuracy of QSAR and QSPR Models , 2015, J. Chem. Inf. Model..

[54]  Qing-You Zhang,et al.  Machine Learning Methods to Predict Density Functional Theory B3LYP Energies of HOMO and LUMO Orbitals , 2017, J. Chem. Inf. Model..

[55]  Dave Winkler,et al.  Bayesian Regularization of Neural Networks , 2009, Artificial Neural Networks.

[56]  Niharika Gauraha,et al.  Introduction to the LASSO , 2018, Resonance.

[57]  Joao Luis Garcia Rosa Artificial Neural Networks - Models and Applications , 2016 .

[58]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[59]  Alexander Golbraikh,et al.  Molecular Dataset Diversity Indices and Their Applications to Comparison of Chemical Databases and QSAR Analysis , 2000, J. Chem. Inf. Comput. Sci..

[60]  Lars Carlsson,et al.  Stereo Signature Molecular Descriptor , 2013, J. Chem. Inf. Model..

[61]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[62]  F. Gao,et al.  Over 16% efficiency organic photovoltaic cells enabled by a chlorinated acceptor with increased open-circuit voltages , 2019, Nature Communications.

[63]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.