Materials Informatics: An Algorithmic Design Rule

Materials informatics, data-enabled investigation, is a"fourth paradigm"in materials science research after the conventional empirical approach, theoretical science, and computational research. Materials informatics has two essential ingredients: fingerprinting materials proprieties and the theory of statistical inference and learning. We have researched the organic semiconductor's enigmas through the materials informatics approach. By applying diverse neural network topologies, logical axiom, and inferencing information science, we have developed data-driven procedures for novel organic semiconductor discovery for the semiconductor industry and knowledge extraction for the materials science community. We have reviewed and corresponded with various algorithms for the neural network design topology for the materials informatics dataset.

[1]  R. Jacobs,et al.  Calibration after bootstrap for accurate uncertainty quantification in regression models , 2022, npj Computational Materials.

[2]  Chi Chen,et al.  A universal graph deep learning interatomic potential for the periodic table , 2022, Nature Computational Science.

[3]  W. Green,et al.  Multi-fidelity prediction of molecular optical peaks with deep learning , 2021, Chemical science.

[4]  A. Strachan,et al.  Active learning and molecular dynamics simulations to find high melting temperature alloys , 2021, Computational Materials Science.

[5]  Alejandro Strachan,et al.  Sim2Ls: FAIR simulation workflows and data , 2021, PloS one.

[6]  Chiho Kim,et al.  Design of polymers for energy storage capacitors using machine learning and evolutionary algorithms , 2021, Journal of Materials Science.

[7]  Florence H. Vermeire,et al.  Group Contribution and Machine Learning Approaches to Predict Abraham Solute Parameters, Solvation Free Energy, and Solvation Enthalpy , 2021, J. Chem. Inf. Model..

[8]  I. Takeuchi,et al.  On-the-fly autonomous control of neutron diffraction via physics-informed Bayesian active learning , 2021, Applied Physics Reviews.

[9]  Suzanna Sia,et al.  Clustering with UMAP: Why and How Connectivity Matters , 2021, ArXiv.

[10]  William H. Green,et al.  Machine Learning of Reaction Properties via Learned Representations of the Condensed Graph of Reaction , 2021, J. Chem. Inf. Model..

[11]  Rampi Ramprasad,et al.  Data-assisted polymer retrosynthesis planning , 2021 .

[12]  Geoffrey E. Hinton,et al.  Deep learning for AI , 2021, Commun. ACM.

[13]  AkshatKumar Nigam,et al.  Parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design , 2021, Digital discovery.

[14]  William H. Green,et al.  Predicting Infrared Spectra with Message Passing Neural Networks , 2021, J. Chem. Inf. Model..

[15]  Deepak Kamal,et al.  Novel high voltage polymer insulators using computational and data-driven techniques. , 2021, The Journal of chemical physics.

[16]  Ankit Srivastava,et al.  Efficiently exploiting process-structure-property relationships in material design by multi-information source fusion , 2021, Acta Materialia.

[17]  A. Strachan,et al.  Neural network reactive force field for C, H, N, and O systems , 2021, npj Computational Materials.

[18]  Chi Chen,et al.  Learning properties of ordered and disordered materials from multi-fidelity data , 2021, Nature Computational Science.

[19]  Alejandro Strachan,et al.  Parsimonious neural networks learn interpretable physical laws , 2020, Scientific Reports.

[20]  Rampi Ramprasad,et al.  Automated knowledge extraction from polymer literature using natural language processing , 2020, iScience.

[21]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[22]  Lihua Chen,et al.  Machine-learning predictions of polymer properties with Polymer Genome , 2020, Journal of Applied Physics.

[23]  Arunkumar Chitteth Rajan,et al.  Polymer informatics with multi-task learning , 2020, Patterns.

[24]  Danial Khatamsaz,et al.  Materials Design Through Batch Bayesian Optimization with Multisource Information Fusion , 2020, JOM.

[25]  Tim Sainburg,et al.  Parametric UMAP Embeddings for Representation and Semisupervised Learning , 2020, Neural Computation.

[26]  Jeff Reback,et al.  pandas-dev/pandas: Pandas 1.1.2 , 2020 .

[27]  V. Chaudhary,et al.  Cyberinfrastructure for Sustained Scientific Innovation (NSF) , 2020, Federal Grants & Contracts.

[28]  Jonas Verhellen,et al.  Illuminating elite patches of chemical space† , 2020, Chemical science.

[29]  D. Morgan,et al.  Opportunities and Challenges for Machine Learning in Materials Science , 2020, Annual Review of Materials Research.

[30]  Jaime Fern'andez del R'io,et al.  Array programming with NumPy , 2020, Nature.

[31]  Chiho Kim,et al.  A Deep Learning Solvent-Selection Paradigm Powered by a Massive Solvent/Nonsolvent Database for Polymers , 2020 .

[32]  Hieu A. Doan,et al.  Quantum Chemistry-Informed Active Learning to Accelerate the Design and Discovery of Sustainable Energy Storage Materials , 2020, Chemistry of Materials.

[33]  Jordan P. Lightstone,et al.  Frequency-dependent dielectric constant prediction of polymers using machine learning , 2020, npj Computational Materials.

[34]  Regina Barzilay,et al.  Uncertainty Quantification Using Neural Networks for Molecular Property Prediction , 2020, J. Chem. Inf. Model..

[35]  Riley J. Hickman,et al.  Gryffin: An algorithm for Bayesian optimization of categorical variables informed by expert knowledge , 2020, 2003.12127.

[36]  S. Khrapak Lindemann melting criterion in two dimensions , 2020, 2002.00651.

[37]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[38]  Samuel Temple Reeve,et al.  Implementing a neural network interatomic model with performance portability for emerging exascale architectures , 2020, Comput. Phys. Commun..

[39]  Ryan P. Lively,et al.  Polymer genome–based prediction of gas permeabilities in polymers , 2020 .

[40]  Masayuki Shirane,et al.  Predicting material properties by integrating high-throughput experiments, high-throughput ab-initio calculations, and machine learning , 2020, Science and technology of advanced materials.

[41]  M. Withnall,et al.  Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction , 2020, Journal of Cheminformatics.

[42]  Dane Morgan,et al.  Error assessment and optimal cross-validation approaches in machine learning applied to impurity diffusion , 2019, Computational Materials Science.

[43]  Ryan Jacobs,et al.  The Materials Simulation Toolkit for Machine learning (MAST-ML): An automated open source toolkit to accelerate data-driven materials research , 2019, Computational Materials Science.

[44]  Alán Aspuru-Guzik,et al.  Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space , 2019, ICLR.

[45]  Rampi Ramprasad,et al.  Critical Assessment of the Hildebrand and Hansen Solubility Parameters for Polymers , 2019, J. Chem. Inf. Model..

[46]  Connor W. Coley,et al.  BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules , 2019, ACS central science.

[47]  Kristof T. Schütt,et al.  Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions , 2019, Nature Communications.

[48]  Seyede Fatemeh Ghoreishi,et al.  Efficient Use of Multiple Information Sources in Material Design , 2019, Acta Materialia.

[49]  Zhehui Wang,et al.  Data-enabled structure–property mappings for lanthanide-activated inorganic scintillators , 2019, Journal of Materials Science.

[50]  Pascal Friederich,et al.  Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation , 2019, Mach. Learn. Sci. Technol..

[51]  Ethan Fetaya,et al.  Evaluating and Calibrating Uncertainty Prediction in Regression Tasks , 2019, Sensors.

[52]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[53]  Markus Meuwly,et al.  PhysNet: A Neural Network for Predicting Energies, Forces, Dipole Moments, and Partial Charges. , 2019, Journal of chemical theory and computation.

[54]  Anand Chandrasekaran,et al.  Impact of dataset uncertainties on machine learning model predictions: the example of polymer glass transition temperatures , 2019, Modelling and Simulation in Materials Science and Engineering.

[55]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[56]  Chi Chen,et al.  Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals , 2018, Chemistry of Materials.

[57]  Prasanna Balaprakash,et al.  DeepHyper: Asynchronous Hyperparameter Search for Deep Neural Networks , 2018, 2018 IEEE 25th International Conference on High Performance Computing (HiPC).

[58]  Patrick Huck,et al.  Active learning for accelerated design of layered materials , 2018, npj Computational Materials.

[59]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[60]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[61]  Michele Ceriotti,et al.  Fast and Accurate Uncertainty Estimation in Chemical Machine Learning. , 2018, Journal of chemical theory and computation.

[62]  K-R Müller,et al.  SchNetPack: A Deep Learning Toolbox For Atomistic Systems. , 2018, Journal of chemical theory and computation.

[63]  Ankit Srivastava,et al.  Multi-Information Source Fusion and Optimization to Realize ICME: Application to Dual-Phase Materials , 2018, Journal of Mechanical Design.

[64]  Anand Chandrasekaran,et al.  Polymer Genome: A Data-Powered Polymer Informatics Platform for Property Predictions , 2018, The Journal of Physical Chemistry C.

[65]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[66]  Sebastian Raschka,et al.  MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack , 2018, J. Open Source Softw..

[67]  Yuma Iwasaki,et al.  Boosting material modeling using game tree search , 2018, Physical Review Materials.

[68]  Mike Preuss,et al.  Planning chemical syntheses with deep neural networks and symbolic AI , 2017, Nature.

[69]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[70]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[71]  Alán Aspuru-Guzik,et al.  Design Principles and Top Non-Fullerene Acceptor Candidates for Organic Photovoltaics , 2017 .

[72]  Arun Mannodi-Kanakkithodi,et al.  Scoping the polymer genome: A roadmap for rational polymer dielectrics design and beyond , 2017, Materials Today.

[73]  K-R Müller,et al.  SchNet - A deep learning architecture for molecules and materials. , 2017, The Journal of chemical physics.

[74]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[75]  Jeffrey C Grossman,et al.  Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. , 2017, Physical review letters.

[76]  Timo Aila,et al.  Interactive reconstruction of Monte Carlo image sequences using a recurrent denoising autoencoder , 2017, ACM Trans. Graph..

[77]  Klaus-Robert Müller,et al.  SchNet: A continuous-filter convolutional neural network for modeling quantum interactions , 2017, NIPS.

[78]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[79]  Stefan Steinerberger,et al.  Clustering with t-SNE, provably , 2017, SIAM J. Math. Data Sci..

[80]  Károly Héberger,et al.  Chemical Data Formats, Fingerprints, and Other Molecular Descriptions for Database Analysis and Searching , 2017 .

[81]  S. Datta,et al.  Design of novel age-hardenable aluminium alloy using evolutionary computation , 2017 .

[82]  W. D. Thomison,et al.  A Model Reification Approach to Fusing Information from Multifidelity Information Sources , 2017 .

[83]  Julia Ling,et al.  High-Dimensional Materials and Process Optimization Using Data-Driven Experimental Design with Well-Calibrated Uncertainty Estimates , 2017, Integrating Materials and Manufacturing Innovation.

[84]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[85]  Supratik Mukhopadhyay,et al.  Break Down in Order To Build Up: Decomposing Small Molecules for Fragment-Based Drug Design with eMolFrag , 2017, J. Chem. Inf. Model..

[86]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[87]  Kenta Hongo,et al.  Bayesian molecular design with a chemical language model , 2017, Journal of Computer-Aided Molecular Design.

[88]  Matt J. Kusner,et al.  Grammar Variational Autoencoder , 2017, ICML.

[89]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[90]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Joshua B. Tenenbaum,et al.  A Compositional Object-Based Approach to Learning Physical Dynamics , 2016, ICLR.

[92]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[93]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[94]  Junichiro Shiomi,et al.  Designing Nanostructures for Phonon Transport via Bayesian Optimization , 2016, 1609.04972.

[95]  Li Li,et al.  Bypassing the Kohn-Sham equations with machine learning , 2016, Nature Communications.

[96]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[97]  Jorge Cadima,et al.  Principal component analysis: a review and recent developments , 2016, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[98]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[99]  Erik Schultes,et al.  The FAIR Guiding Principles for scientific data management and stewardship , 2016, Scientific Data.

[100]  Arun Mannodi-Kanakkithodi,et al.  Rational Co‐Design of Polymer Dielectrics for Energy Storage , 2016, Advanced materials.

[101]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[102]  Chiho Kim,et al.  A polymer dataset for accelerated property prediction and design , 2016, Scientific Data.

[103]  Arun Mannodi-Kanakkithodi,et al.  Machine Learning Strategy for Accelerated Design of Polymer Dielectrics , 2016, Scientific Reports.

[104]  Wei Xiong,et al.  Cybermaterials: materials by design and accelerated insertion of materials , 2016 .

[105]  Samy Bengio,et al.  Order Matters: Sequence to sequence for sets , 2015, ICLR.

[106]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[107]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[108]  Stefan Steinerberger,et al.  A Hidden Signal in the Ulam Sequence , 2015, Exp. Math..

[109]  O. A. von Lilienfeld,et al.  Electronic spectra from TDDFT and machine learning in chemical space. , 2015, The Journal of chemical physics.

[110]  Arun Mannodi-Kanakkithodi,et al.  Accelerated materials property predictions and design using motif-based fingerprints , 2015, 1503.07503.

[111]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[112]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[113]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[114]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[115]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[116]  Alok Choudhary,et al.  Combinatorial screening for new materials in unconstrained composition space with machine learning , 2014 .

[117]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[118]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[119]  Trevor J. Hastie,et al.  Confidence intervals for random forests: the jackknife and the infinitesimal jackknife , 2013, J. Mach. Learn. Res..

[120]  Charles H. Ward Materials Genome Initiative for Global Competitiveness , 2012 .

[121]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[122]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[123]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[124]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[125]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[126]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[127]  Laurens van der Maaten,et al.  Learning a Parametric Embedding by Preserving Local Structure , 2009, AISTATS.

[128]  Matthias Rarey,et al.  On the Art of Compiling and Using 'Drug‐Like' Chemical Fragment Spaces , 2008, ChemMedChem.

[129]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[130]  Ah Chung Tsoi,et al.  Graph neural networks for ranking Web pages , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[131]  Tatsuya Akutsu,et al.  Extensions of marginalized graph kernels , 2004, ICML.

[132]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[133]  James G. Nourse,et al.  Reoptimization of MDL Keys for Use in Drug Discovery , 2002, J. Chem. Inf. Comput. Sci..

[134]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[135]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[136]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[137]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[138]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[139]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[140]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[141]  R. L. Winkler Combining Probability Distributions from Dependent Information Sources , 1981 .

[142]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[143]  G. Grimvall,et al.  Correlation of Properties of Materials to Debye and Melting Temperatures , 1974 .

[144]  M. Ross,et al.  Generalized Lindemann Melting Law , 1969 .

[145]  L. Libby,et al.  New Melting Law at High Pressures. , 1969 .

[146]  H. L. Morgan The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. , 1965 .

[147]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[148]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[149]  A. M. Turing,et al.  Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[150]  D. P. MacDougall,et al.  A Mechanical Analyzer for the Solution of Secular Equations and the Calculation of Molecular Vibration Frequencies , 1937 .

[151]  Franz Simon,et al.  Bemerkungen zur Schmelzdruckkurve , 1929 .

[152]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[153]  R. Batra,et al.  Polymer design using genetic algorithm and machine learning , 2021, Computational Materials Science.

[154]  R. Ramprasad,et al.  Copolymer Informatics with Multitask Deep Neural Networks , 2021 .

[155]  Deepak Kamal,et al.  polyG2G: A Novel Machine Learning Algorithm Applied to the Generative Design of Polymer Dielectrics , 2021, Chemistry of Materials.

[156]  Svetha Venkatesh,et al.  Batch Bayesian optimization using multi-scale search , 2020, Knowl. Based Syst..

[157]  Kyle Swanson,et al.  Message passing neural networks for molecular property prediction , 2019 .

[158]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[159]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[160]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[161]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[162]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[163]  Ah Chung Tsoi,et al.  Computational Capabilities of Graph Neural Networks , 2009, IEEE Transactions on Neural Networks.

[164]  Anthony Turner,et al.  Materials by Design , 2008 .

[165]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[166]  Ian A. Watson,et al.  ErG: 2D Pharmacophore Descriptions for Scaffold Hopping , 2006, J. Chem. Inf. Model..

[167]  W. C. Mitchell Statistical Mechanics of Thermally Driven Systems. , 1967 .

[168]  Alex Fraser,et al.  Simulation of Genetic Systems by Automatic Digital Computers I. Introduction , 1957 .

[169]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .