An Ecosystem for Digital Reticular Chemistry

The vastness of the materials design space makes it impractical to explore using traditional brute-force methods, particularly in reticular chemistry. However, machine learning has shown promise in expediting and guiding materials design. Despite numerous successful applications of machine learning to reticular materials, progress in the field has stagnated, possibly because digital chemistry is more an art than a science and its limited accessibility to inexperienced researchers. To address this issue, we present mofdscribe, a software ecosystem tailored to novice and seasoned digital chemists that streamlines the ideation, modeling, and publication process. Though optimized for reticular chemistry, our tools are versatile and can be used in nonreticular materials research. We believe that mofdscribe will enable a more reliable, efficient, and comparable field of digital chemistry.

[1]  S. M. Moosavi,et al.  A Data-Science Approach to Predict the Heat Capacity of Nanoporous Materials , 2022, Nature Materials.

[2]  Taylor D. Sparks,et al.  xtal2png: A Python package for representing crystal structure as PNG files , 2022, J. Open Source Softw..

[3]  Marwin H. S. Segler,et al.  Evaluation guidelines for machine learning tools in the chemical sciences , 2022, Nature Reviews Chemistry.

[4]  H. Stein Advancing data-driven chemistry by beating benchmarks , 2022, Trends in Chemistry.

[5]  B. Smit,et al.  Making the collective knowledge of chemistry open and machine actionable , 2022, Nature Chemistry.

[6]  S. Neumann,et al.  Ontologies4Chem: the landscape of ontologies in chemistry , 2022, Pure and Applied Chemistry.

[7]  Li‐Chiang Lin,et al.  Chemistry-Encoded Convolutional Neural Networks for Predicting Gaseous Adsorption in Porous Materials , 2022, The Journal of Physical Chemistry C.

[8]  J. Akroyd,et al.  From Platform to Knowledge Graph: Evolution of Laboratory Automation , 2022, JACS Au.

[9]  Matthew K. Horton,et al.  High-throughput predictions of metal–organic framework electronic properties: theoretical challenges, graph neural networks, and data exploration , 2021, npj Computational Materials.

[10]  N. Arunachalam,et al.  MOFSimplify: Machine Learning Models with Extracted Stability Data of Three Thousand Metal-Organic Frameworks , 2021, ArXiv.

[11]  Andrew S. Rosen,et al.  Realizing the data-driven, computational discovery of metal-organic framework catalysts , 2021, Current Opinion in Chemical Engineering.

[12]  Pascal Friederich,et al.  MOF Synthesis Prediction Enabled by Automatic Data Mining and Machine Learning , 2021, Angewandte Chemie.

[13]  Simon Axelrod,et al.  GEOM: Energy-annotated molecular conformations for property prediction and molecular generation , 2020, ArXiv.

[14]  Firas A. Khasawneh,et al.  Approximating Continuous Functions on Persistence Diagrams Using Template Functions , 2019, Foundations of Computational Mathematics.

[15]  B. Smit,et al.  Data-driven matching of experimental crystal structures and gas adsorption isotherms of metal-organic frameworks , 2021, Journal of Chemical & Engineering Data.

[16]  Prerna,et al.  Building Unit Extractor for Metal-Organic Frameworks , 2021, J. Chem. Inf. Model..

[17]  S. M. Moosavi,et al.  Diversifying Databases of Metal Organic Frameworks for High-Throughput Computational Screening , 2021, ACS applied materials & interfaces.

[18]  Leroy Cronin,et al.  Chemputation and the Standardization of Chemical Informatics , 2021, JACS Au.

[19]  A. Tkatchenko,et al.  Machine Learning Force Fields: Recent Advances and Remaining Challenges. , 2021, The journal of physical chemistry letters.

[20]  S. M. Moosavi,et al.  Using collective knowledge to assign oxidation states of metal cations in metal–organic frameworks , 2021, Nature Chemistry.

[21]  Chenru Duan,et al.  Using Machine Learning and Data Mining to Leverage Community Knowledge for the Engineering of Stable Metal-Organic Frameworks , 2021, Journal of the American Chemical Society.

[22]  Anubhav Jain,et al.  Best practices in machine learning for chemistry , 2021, Nature Chemistry.

[23]  Eun Hyun Cho,et al.  Nanoporous Material Recognition via 3D Convolutional Neural Networks: Prediction of Adsorption Properties. , 2021, The journal of physical chemistry letters.

[24]  F. Pan,et al.  Topological representations of crystalline compounds for the machine-learning prediction of materials properties , 2021, npj Computational Materials.

[25]  B. Smit,et al.  Bias free multiobjective active learning for materials design and discovery , 2020, Nature Communications.

[26]  Joseph H. Montoya,et al.  Machine learning with persistent homology and chemical word embeddings improves prediction accuracy and interpretability in metal-organic frameworks , 2020, Scientific Reports.

[27]  L. Cronin,et al.  Digitizing Chemistry Using the Chemical Processing Unit: From Synthesis to Discovery. , 2020, Accounts of chemical research.

[28]  S. M. Moosavi,et al.  The Role of Machine Learning in the Understanding and Design of Materials , 2020, Journal of the American Chemical Society.

[29]  Krista S. Walton,et al.  Prediction of water stability of metal–organic frameworks using machine learning , 2020, Nature Machine Intelligence.

[30]  Andrew S. Rosen,et al.  Machine learning the quantum-chemical properties of metal–organic frameworks for accelerated materials discovery , 2020, Matter.

[31]  Leroy Cronin,et al.  A universal system for digitization and automatic execution of the chemical synthesis literature , 2020, Science.

[32]  S. Wuttke,et al.  Digital Reticular Chemistry , 2020, Chem.

[33]  Cameron J. Hargreaves,et al.  The Earth Mover’s Distance as a Metric for the Space of Inorganic Compositions , 2020, Chemistry of Materials.

[34]  Sebastiaan P. Huber,et al.  Workflows in AiiDA: Engineering a high-throughput, event-based engine for robust and modular computational workflows , 2020, ArXiv.

[35]  S. Wuttke,et al.  Standard Practices of Reticular Chemistry , 2020, ACS central science.

[36]  Peter G. Boyd,et al.  Understanding the diversity of the metal-organic framework ecosystem , 2020, Nature Communications.

[37]  Steven K. Kauwe,et al.  Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices , 2020, Chemistry of Materials.

[38]  Anubhav Jain,et al.  Benchmarking materials property prediction methods: the Matbench test set and Automatminer reference algorithm , 2020, npj Computational Materials.

[39]  Aliaksandr V. Yakutovich,et al.  Materials Cloud, a platform for open computational science , 2020, Scientific Data.

[40]  Boris Kozinsky,et al.  AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance , 2020, Scientific data.

[41]  Berend Smit,et al.  Big-Data Science in Porous Materials: Materials Genomics and Machine Learning , 2020, Chemical reviews.

[42]  D. Morozov,et al.  Topological Descriptors Help Predict Guest Adsorption in Nanoporous Materials , 2020, The Journal of Physical Chemistry C.

[43]  Ming Hu,et al.  Evaluating explorative prediction power of machine learning algorithms for materials discovery using k-fold forward cross-validation , 2020, Computational Materials Science.

[44]  Trang T. Le,et al.  Scaling tree-based automated machine learning to biomedical big data with a feature set selector , 2019, Bioinform..

[45]  Jihan Kim,et al.  Inverse design of porous materials using artificial neural networks , 2019, Science Advances.

[46]  Jeffrey A. Reimer,et al.  Data-driven design of metal–organic frameworks for wet flue gas CO2 capture , 2019, Nature.

[47]  Jeffrey S. Camp,et al.  Advances, Updates, and Analytics for the Computation-Ready, Experimental Metal–Organic Framework Database: CoRE MOF 2019 , 2019, Journal of Chemical & Engineering Data.

[48]  Berend Smit,et al.  Building a Consistent and Reproducible Database for Adsorption Evaluation in Covalent–Organic Frameworks , 2019, ACS central science.

[49]  Alán Aspuru-Guzik,et al.  Identification Schemes for Metal–Organic Frameworks To Enable Rapid Search and Cheminformatics Analysis , 2019, Crystal Growth & Design.

[50]  B. Smit,et al.  Applicability of Tail Corrections in the Molecular Simulations of Porous Materials , 2019, Journal of chemical theory and computation.

[51]  O. Yaghi,et al.  Introduction to Reticular Chemistry , 2019 .

[52]  Patrick Riley,et al.  Three pitfalls to avoid in machine learning , 2019, Nature.

[53]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[54]  Leroy Cronin,et al.  Organic synthesis in a modular robotic system driven by a chemical programming language , 2019, Science.

[55]  Marwin H. S. Segler,et al.  GuacaMol: Benchmarking Models for De Novo Molecular Design , 2018, J. Chem. Inf. Model..

[56]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[57]  Kyle Chard,et al.  Matminer: An open source toolkit for materials data mining , 2018, Computational Materials Science.

[58]  Ryther Anderson,et al.  Role of Pore Chemistry and Topology in the CO2 Capture Capabilities of MOFs: From Molecular Simulation to Machine Learning , 2018, Chemistry of Materials.

[59]  Berend Smit,et al.  High-Throughput Screening Approach for Nanoporous Materials Genome Using Topological Data Analysis: Application to Zeolites , 2018, Journal of chemical theory and computation.

[60]  K. Butler,et al.  Machine learning for molecular and materials science , 2018, Nature.

[61]  Senja Barthel,et al.  Distinguishing Metal–Organic Frameworks , 2018, Crystal growth & design.

[62]  D. Donoho 50 Years of Data Science , 2017 .

[63]  H. Kulik,et al.  Resolving Transition Metal Chemical Space: Feature Selection for Machine Learning and Structure-Property Relationships. , 2017, The journal of physical chemistry. A.

[64]  Peter G. Boyd,et al.  Accurate Characterization of the Pore Volume in Microporous Crystalline Materials , 2017, Langmuir : the ACS journal of surfaces and colloids.

[65]  Paweł Dłotko,et al.  Quantifying similarity of pore-geometry in nanoporous materials , 2017, Nature Communications.

[66]  Peyman Z. Moghadam,et al.  Development of a Cambridge Structural Database Subset: A Collection of Metal-Organic Frameworks for Past, Present, and Future , 2017 .

[67]  Jerome G. P. Wicker,et al.  Beyond Rotatable Bond Counts: Capturing 3D Conformational Flexibility in a Single Descriptor , 2016, J. Chem. Inf. Model..

[68]  Miguel A. L. Marques,et al.  The optimal one dimensional periodic table: a modified Pettifor chemical scale from data mining , 2016 .

[69]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[70]  R. Snurr,et al.  RASPA: molecular simulation software for adsorption and diffusion in flexible nanoporous materials , 2016 .

[71]  Patrick Huck,et al.  User applications driven by the community contribution framework MPContribs in the Materials Project , 2015, Concurr. Comput. Pract. Exp..

[72]  Maciej Haranczyk,et al.  What Are the Best Materials To Separate a Xenon/Krypton Mixture? , 2015 .

[73]  Maciej Haranczyk,et al.  Computation-Ready, Experimental Metal–Organic Frameworks: A Tool To Enable High-Throughput Screening of Nanoporous Crystals , 2014 .

[74]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[75]  Paolo Tosco,et al.  Bringing the MMFF force field to the RDKit: implementation and validation , 2014, Journal of Cheminformatics.

[76]  M. Haranczyk,et al.  From rays to structures: Representation and selection of void structures in zeolites using stochastic methods , 2013 .

[77]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[78]  Tom K. Woo,et al.  Atomic Property Weighted Radial Distribution Functions Descriptors of Metal–Organic Frameworks for the Prediction of Gas Uptake Capacity , 2013 .

[79]  Richard L. Martin,et al.  haracterization and comparison of pore landscapes in crystalline orous materials , 2013 .

[80]  Matthias Rarey,et al.  Protein pocket and ligand shape comparison and its application in virtual screening , 2013, Journal of Computer-Aided Molecular Design.

[81]  M. Rupp,et al.  Machine learning of molecular electronic properties in chemical compound space , 2013, 1305.7074.

[82]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[83]  Freeman J. Dyson,et al.  Is Science Mostly Driven by Ideas or by Tools? , 2012, Science.

[84]  Anubhav Jain,et al.  Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis , 2012 .

[85]  Randall Q Snurr,et al.  An Extended Charge Equilibration Method. , 2012, The journal of physical chemistry letters.

[86]  Abhoyjit S Bhown,et al.  In silico screening of carbon-capture materials. , 2012, Nature materials.

[87]  Maciej Haranczyk,et al.  Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials , 2012 .

[88]  S. Sawilowsky New Effect Size Rules of Thumb , 2009 .

[89]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[90]  W. Kohn,et al.  Nearsightedness of electronic matter. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[91]  Wolfgang H. B. Sauer,et al.  Molecular Shape Diversity of Combinatorial Libraries: A Prerequisite for Broad Bioactivity , 2003, J. Chem. Inf. Comput. Sci..

[92]  Andreas Fabri CGAL: the Computational Geometry Algorithms Library , 2009, IMR.

[93]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[94]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[95]  Artur Baumgärtner,et al.  Shapes of flexible vesicles at constant volume , 1993 .

[96]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[97]  D. G. Pettifor,et al.  A chemical scale for crystal-structure maps , 1984 .

[98]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .