Scientific intuition inspired by machine learning-generated hypotheses

Machine learning with application to questions in the physical sciences has become a widely used tool, successfully applied to classification, regression and optimization tasks in many areas. Research focus mostly lies in improving the accuracy of the machine learning models in numerical predictions, while scientific understanding is still almost exclusively generated by human researchers analysing numerical results and drawing conclusions. In this work, we shift the focus on the insights and the knowledge obtained by the machine learning models themselves. In particular, we study how it can be extracted and used to inspire human scientists to increase their intuitions and understanding of natural systems. We apply gradient boosting in decision trees to extract human-interpretable insights from big data sets from chemistry and physics. In chemistry, we not only rediscover widely know rules of thumb but also find new interesting motifs that tell us how to control solubility and energy levels of organic molecules. At the same time, in quantum physics, we gain new understanding on experiments for quantum entanglement. The ability to go beyond numerics and to enter the realm of scientific insight and hypothesis generation opens the door to use machine learning to accelerate the discovery of conceptual understanding in some of the most challenging domains of science.

[1]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[2]  Renato Renner,et al.  Discovering physical concepts with neural networks , 2018, Physical review letters.

[3]  Nicolai Friis,et al.  Entanglement certification from theory to experiment , 2018, Nature Reviews Physics.

[4]  Ian M. Pendleton,et al.  Robot-Accelerated Perovskite Investigation and Discovery , 2020, Chemistry of Materials.

[5]  Nathan Wiebe,et al.  Pattern recognition techniques for Boson Sampling validation , 2017, Physical Review X.

[6]  Marcus Huber,et al.  Entropy vector formalism and the structure of multidimensional entanglement in multipartite systems , 2013, 1307.3541.

[7]  Matthias Rupp,et al.  Big Data Meets Quantum Chemistry Approximations: The Δ-Machine Learning Approach. , 2015, Journal of chemical theory and computation.

[8]  Alán Aspuru-Guzik,et al.  The Harvard organic photovoltaic dataset , 2016, Scientific Data.

[9]  Mario Krenn,et al.  Computer-inspired quantum experiments , 2020, Nature Reviews Physics.

[10]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[12]  Thomas Schrader,et al.  Cell entry of a host-targeting protein of oomycetes requires gp96 , 2018, Nature Communications.

[13]  Mario Krenn,et al.  Experimental Greenberger–Horne–Zeilinger entanglement beyond qubits , 2018, Nature Photonics.

[14]  Laurence Perreault Levasseur,et al.  Fast automated analysis of strong gravitational lenses with convolutional neural networks , 2017, Nature.

[15]  Alán Aspuru-Guzik,et al.  Conceptual Understanding through Efficient Automated Design of Quantum Optical Experiments , 2020, Physical Review X.

[16]  Marcus Huber,et al.  Structure of multidimensional entanglement in multipartite systems. , 2012, Physical review letters.

[17]  Matej Pivoluska,et al.  Layered quantum key distribution , 2017, 1709.00377.

[18]  Hendrik Dietz,et al.  Biotechnological mass production of DNA origami , 2017, Nature.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  A. Zeilinger,et al.  Automated Search for new Quantum Experiments. , 2015, Physical review letters.

[21]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[22]  Steven L. Brunton,et al.  Deep learning for universal linear embeddings of nonlinear dynamics , 2017, Nature Communications.

[23]  Ribana Roscher,et al.  Explainable Machine Learning for Scientific Insights and Discoveries , 2019, IEEE Access.

[24]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[25]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[26]  Hans-J. Briegel,et al.  Machine learning for long-distance quantum communication , 2019, PRX Quantum.

[27]  Cooper J. Galvin,et al.  Complex Chemical Reaction Networks from Heuristics-Aided Quantum Chemistry. , 2014, Journal of chemical theory and computation.

[28]  H. Weinfurter,et al.  Multiphoton entanglement and interferometry , 2003, 0805.2853.

[29]  O. Sejersted Nobel Prize for Chemistry , 1937, Nature.

[30]  Alán Aspuru-Guzik,et al.  Inverse molecular design using machine learning: Generative models for matter engineering , 2018, Science.

[31]  Jing Liu,et al.  A search algorithm for quantum state engineering and metrology , 2015, 1511.05327.

[32]  Vijay Ganesh,et al.  Discovering Symmetry Invariants and Conserved Quantities by Interpreting Siamese Neural Networks , 2020, Physical Review Research.

[33]  Jay Lawrence,et al.  Mermin inequalities for perfect correlations in many-qutrit systems , 2017, 1701.08331.

[34]  Alán Aspuru-Guzik,et al.  Design Principles and Top Non-Fullerene Acceptor Candidates for Organic Photovoltaics , 2017 .

[35]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[36]  Frank Noé,et al.  Deep-neural-network solution of the electronic Schrödinger equation , 2020, Nature Chemistry.

[37]  Naftali Tishby,et al.  Machine learning and the physical sciences , 2019, Reviews of Modern Physics.

[38]  Matej Pivoluska,et al.  Experimental creation of multi-photon high-dimensional layered quantum states , 2020 .

[39]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[40]  Leroy Cronin,et al.  How to explore chemical space using algorithms and automation , 2019, Nature Reviews Chemistry.

[41]  H. Dietz,et al.  Dynamic DNA devices and assemblies formed by shape-complementary, non–base pairing 3D components , 2015, Science.

[42]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[43]  Ryan P. Adams,et al.  Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach. , 2016, Nature materials.

[44]  Anuja P. Rahalkar,et al.  The predictive power of aromaticity: quantitative correlation between aromaticity and ionization potentials and HOMO-LUMO gaps in oligomers of benzene, pyrrole, furan, and thiophene. , 2018, Physical chemistry chemical physics : PCCP.

[45]  Danail Bonchev,et al.  Chemical Reaction Networks: A Graph-Theoretical Approach , 1996 .

[46]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[47]  J. Brédas,et al.  Relationship between band gap and bond length alternation in organic conjugated polymers , 1985 .

[48]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[49]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[50]  Celia Arnaud,et al.  NOBEL PRIZE IN CHEMISTRY , 2008 .

[51]  Barry C. Sanders,et al.  Experimental quantum cloning in a pseudo-unitary system , 2020 .

[52]  A. Zeilinger,et al.  Multi-photon entanglement in high dimensions , 2015, Nature Photonics.

[53]  Roger G. Melko,et al.  Machine learning phases of matter , 2016, Nature Physics.

[54]  D. Klyshko,et al.  METHODOLOGICAL NOTES: A simple method of preparing pure states of an optical field, of implementing the Einstein-Podolsky-Rosen experiment, and of demonstrating the complementarity principle , 1988 .