The Alexandria library, a quantum-chemical database of molecular properties for force field development

Data quality as well as library size are crucial issues for force field development. In order to predict molecular properties in a large chemical space, the foundation to build force fields on needs to encompass a large variety of chemical compounds. The tabulated molecular physicochemical properties also need to be accurate. Due to the limited transparency in data used for development of existing force fields it is hard to establish data quality and reusability is low. This paper presents the Alexandria library as an open and freely accessible database of optimized molecular geometries, frequencies, electrostatic moments up to the hexadecupole, electrostatic potential, polarizabilities, and thermochemistry, obtained from quantum chemistry calculations for 2704 compounds. Values are tabulated and where available compared to experimental data. This library can assist systematic development and training of empirical force fields for a broad range of molecules.

[1]  Parr,et al.  Development of the Colle-Salvetti correlation-energy formula into a functional of the electron density. , 1988, Physical review. B, Condensed matter.

[2]  L. Curtiss,et al.  Gaussian‐1 theory: A general procedure for prediction of molecular energies , 1989 .

[3]  J. R. Carl,et al.  Atom dipole interaction model for molecular polarizability. Application to polyatomic molecules and determination of atom polarizabilities , 1972 .

[4]  P. Kollman,et al.  Atomic charges derived from semiempirical methods , 1990 .

[5]  J. Reymond The chemical space project. , 2015, Accounts of chemical research.

[6]  John A. Montgomery,et al.  A complete basis set model chemistry. VII. Use of the minimum population localization method , 2000 .

[7]  Krishnan Raghavachari,et al.  Gaussian‐1 theory of molecular energies for second‐row compounds , 1990 .

[8]  F. L. Hirshfeld Bonded-atom fragments for describing molecular charge densities , 1977 .

[9]  Michael J. Frisch,et al.  The performance of the Becke-Lee-Yang-Parr (B-LYP) density functional theory with various basis sets , 1992 .

[10]  Carl L. Yaws The Yaws Handbook of Thermodynamic Properties for Hydrocarbons and Chemicals , 2005 .

[11]  P. Kollman,et al.  A well-behaved electrostatic potential-based method using charge restraints for deriving atomic char , 1993 .

[12]  J. Marth,et al.  A unified vision of the building blocks of life , 2008, Nature Cell Biology.

[13]  Pavlo O. Dral,et al.  Quantum chemistry structures and properties of 134 kilo molecules , 2014, Scientific Data.

[14]  Michael J Frisch,et al.  Unrestricted Coupled Cluster and Brueckner Doubles Variations of W1 Theory. , 2009, Journal of chemical theory and computation.

[15]  Kenneth J. Miller,et al.  Additivity methods in molecular polarizability , 1990 .

[16]  Florence Debarre,et al.  The Availability of Research Data Declines Rapidly with Article Age , 2013, Current Biology.

[17]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[18]  Alán Aspuru-Guzik,et al.  The Harvard organic photovoltaic dataset , 2016, Scientific Data.

[19]  Peter Murray-Rust,et al.  Minimum information about a bioactive entity (MIABE) , 2011, Nature Reviews Drug Discovery.

[20]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[21]  Mark S. Gordon,et al.  Self‐consistent molecular orbital methods. XXIII. A polarization‐type basis set for second‐row elements , 1982 .

[22]  John M Simmie,et al.  A Database of Formation Enthalpies of Nitrogen Species by Compound Methods (CBS-QB3, CBS-APNO, G3, G4). , 2015, The journal of physical chemistry. A.

[23]  B. Shepler,et al.  On the spectroscopic and thermochemical properties of ClO, BrO, IO, and their anions. , 2006, The journal of physical chemistry. A.

[24]  A. Becke Density-functional thermochemistry. III. The role of exact exchange , 1993 .

[25]  A. Hopkins,et al.  Navigating chemical space for biology and medicine , 2004, Nature.

[26]  T. Dunning,et al.  Electron affinities of the first‐row atoms revisited. Systematic basis sets and wave functions , 1992 .

[27]  Angela K. Wilson,et al.  Gaussian basis sets for use in correlated molecular calculations. IX. The atoms gallium through krypton , 1993 .

[28]  Marcus D. Hanwell,et al.  Avogadro: an advanced semantic chemical editor, visualization, and analysis platform , 2012, Journal of Cheminformatics.

[29]  Markus Meuwly,et al.  Toolkit for the Construction of Reproducing Kernel-Based Representations of Data: Application to Multidimensional Potential Energy Surfaces , 2017, J. Chem. Inf. Model..

[30]  J. Medina-Franco,et al.  Expanding the medicinally relevant chemical space with compound libraries. , 2012, Drug discovery today.

[31]  Antony J. Williams,et al.  ChemSpider:: An Online Chemical Information Resource , 2010 .

[32]  A. Becke,et al.  Density-functional exchange-energy approximation with correct asymptotic behavior. , 1988, Physical review. A, General physics.

[33]  Jacob D. Durrant,et al.  Molecular dynamics simulations and drug discovery , 2011, BMC Biology.

[34]  P. Hohenberg,et al.  Inhomogeneous Electron Gas , 1964 .

[35]  Donald G Truhlar,et al.  Charge Model 5: An Extension of Hirshfeld Population Analysis for the Accurate Description of Molecular Interactions in Gaseous and Condensed Phases. , 2012, Journal of chemical theory and computation.

[36]  Krishnan Raghavachari,et al.  Gaussian-2 theory for molecular energies of first- and second-row compounds , 1991 .

[37]  M. W. Chase,et al.  NIST-JANAF Thermochemical Tables Fourth Edition , 1998 .

[38]  Jonas C. Ditz,et al.  Large-scale calculations of gas phase thermochemistry: Enthalpy of formation, standard entropy, and heat capacity , 2016 .

[39]  P. Kollman,et al.  An approach to computing electrostatic charges for molecules , 1984 .

[40]  Clemens C. J. Roothaan,et al.  New Developments in Molecular Orbital Theory , 1951 .

[41]  Jean-Louis Reymond,et al.  Fragment Database FDB-17 , 2017, J. Chem. Inf. Model..

[42]  Adrian E. Roitberg,et al.  ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules , 2017, ArXiv.

[43]  P. Kirkpatrick,et al.  Chemical space , 2004, Nature.

[44]  Kirk A. Peterson,et al.  Approximating the basis set dependence of coupled cluster calculations: Evaluation of perturbation theory approximations for stable molecules , 2000 .

[45]  J. Pople,et al.  Self—Consistent Molecular Orbital Methods. XII. Further Extensions of Gaussian—Type Basis Sets for Use in Molecular Orbital Studies of Organic Molecules , 1972 .

[46]  J. Reymond,et al.  Exploring chemical space for drug discovery using the chemical universe database. , 2012, ACS chemical neuroscience.

[47]  L. Curtiss,et al.  Gaussian-4 theory. , 2007, The Journal of chemical physics.

[48]  G. A. Petersson,et al.  A complete basis set model chemistry. VI. Use of density functional geometries and frequencies , 1999 .

[49]  Olexandr Isayev,et al.  ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules , 2017, Scientific Data.

[50]  Maho Nakata,et al.  PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry , 2017, J. Chem. Inf. Model..

[51]  Thom H. Dunning,et al.  Benchmark calculations with correlated molecular wave functions. I: Multireference configuration interaction calculations for the second row diatomic hydrides , 1993 .

[52]  J. Pople,et al.  Self‐consistent molecular orbital methods. XX. A basis set for correlated wave functions , 1980 .

[53]  R. S. Mulliken Electronic Population Analysis on LCAO–MO Molecular Wave Functions. I , 1955 .

[54]  G. Schaftenaar,et al.  Molden: a pre- and post-processing program for molecular and electronic structures* , 2000, J. Comput. Aided Mol. Des..

[55]  M. Head‐Gordon,et al.  How Accurate Is Density Functional Theory at Predicting Dipole Moments? An Assessment Using a New Database of 200 Benchmark Values. , 2017, Journal of chemical theory and computation.

[56]  L. Curtiss,et al.  Gaussian-3 (G3) theory for molecules containing first and second-row atoms , 1998 .

[57]  David,et al.  Gaussian basis sets for use in correlated molecular calculations . Ill . The atoms aluminum through argon , 1999 .

[58]  C. Dobson Chemical space and biology , 2004, Nature.

[59]  Timothy Clark,et al.  Efficient diffuse function‐augmented basis sets for anion calculations. III. The 3‐21+G basis set for first‐row elements, Li–F , 1983 .

[60]  M. Rupp,et al.  Machine Learning for Quantum Mechanical Properties of Atoms in Molecules , 2015, 1505.00350.