Active learning in Gaussian process interpolation of potential energy surfaces.

Three active learning schemes are used to generate training data for Gaussian process interpolation of intermolecular potential energy surfaces. These schemes aim to achieve the lowest predictive error using the fewest points and therefore act as an alternative to the status quo methods involving grid-based sampling or space-filling designs like Latin hypercubes (LHC). Results are presented for three molecular systems: CO2-Ne, CO2-H2, and Ar3. For each system, two of the active learning schemes proposed notably outperform LHC designs of comparable size, and in two of the systems, produce an error value an order of magnitude lower than the one produced by the LHC method. The procedures can be used to select a subset of points from a large pre-existing data set, to select points to generate data de novo, or to supplement an existing data set to improve accuracy.

[1]  Dong H. Zhang,et al.  Construction of reactive potential energy surfaces with Gaussian process regression: active data selection , 2018 .

[2]  Jack P. C. Kleijnen,et al.  Application-driven sequential designs for simulation experiments: Kriging metamodelling , 2004, J. Oper. Res. Soc..

[3]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Jack P. C. Kleijnen,et al.  The correct Kriging variance estimated by bootstrapping , 2006, J. Oper. Res. Soc..

[5]  Masayuki Karasuyama,et al.  Machine-learning-based selective sampling procedure for identifying the low-energy region in a potential energy surface: A case study on proton conduction in oxides , 2015, 1512.00623.

[6]  Zhenwei Li,et al.  Molecular dynamics with on-the-fly machine learning of quantum-mechanical forces. , 2015, Physical review letters.

[7]  P. Popelier,et al.  Potential energy surfaces fitted by artificial neural networks. , 2010, The journal of physical chemistry. A.

[8]  Alexander V. Shapeev,et al.  Active learning of linearly parametrized interatomic potentials , 2016, 1611.09346.

[9]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[10]  Gisbert Schneider,et al.  Machine Learning Estimates of Natural Product Conformational Energies , 2014, PLoS Comput. Biol..

[11]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[12]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[13]  Richard D Wilkinson,et al.  Interpolation of intermolecular potentials using Gaussian processes. , 2017, The Journal of chemical physics.

[14]  Richard D Wilkinson,et al.  Molecular simulation of the thermophysical properties and phase behaviour of impure CO2 relevant to CCS. , 2016, Faraday discussions.

[15]  Renu Vyas,et al.  Machine Learning Methods in Chemoinformatics for Drug Discovery , 2014 .

[16]  Roman V. Krems,et al.  Efficient non-parametric fitting of potential energy surfaces for polyatomic molecules with Gaussian processes , 2015, 1509.06473.

[17]  Hua Guo,et al.  Representing Global Reactive Potential Energy Surfaces Using Gaussian Processes. , 2017, The journal of physical chemistry. A.

[18]  Xinwei Deng,et al.  Active Learning Through Sequential Design, With Applications to Detection of Money Laundering , 2009 .