GaussDal: An open source database management system for quantum chemical computations

Abstract An open source software system called GaussDal for management of results from quantum chemical computations is presented. Chemical data contained in output files from different quantum chemical programs are automatically extracted and incorporated into a relational database (PostgreSQL). The Structural Query Language (SQL) is used to extract combinations of chemical properties (e.g., molecules, orbitals, thermo-chemical properties, basis sets etc.) into data tables for further data analysis, processing and visualization. This type of data management is particularly suited for projects involving a large number of molecules. In the current version of GaussDal, parsers for Gaussian and Dalton output files are supported, however future versions may also include parsers for other quantum chemical programs. For visualization and analysis of generated data tables from GaussDal we have used the locally developed open source software SciCraft. Program summary Title of program: GaussDal Catalogue identifier: ADVT Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADVT Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computers: Any Operating system under which the system has been tested: Linux Programming language used: Python Memory required to execute with typical data: 256 MB No. of bits in word: 32 or 64 No. of processors used: 1 Has the code been vectorized or parallelized?: No No. of lines in distributed program, including test data, etc: 543 531 No. of bytes in distribution program, including test data, etc: 7 718 121 Distribution format: tar.gzip file Nature of physical problem: Handling of large amounts of data from quantum chemistry computations. Method of solution: Use of SQL based database and quantum chemistry software specific parsers. Restriction on the complexity of the problem: Program is currently limited to Gaussian and Dalton output, but expandable to other formats. Generates subsets of multiple data tables from output files.

[1]  Óscar Cánovas Reverte,et al.  MPI-Delphi: an MPI implementation for visual programming environments and heterogeneous computing , 2002, Future Gener. Comput. Syst..

[2]  H. Bernhard Schlegel,et al.  Reaction Path Following in Mass-Weighted Internal Coordinates , 1990 .

[3]  J. Michael Finlan,et al.  New alternative to the Dunham potential for diatomic molecules , 1973 .

[4]  Richard A. Friesner,et al.  Mixed ab initio QM/MM modeling using frozen orbitals and tests with alanine dipeptide and tetrapeptide , 1999 .

[5]  Joseph M. Maubach,et al.  Data-Flow Oriented Visual Programming Libraries for Scientific Computing , 2002, International Conference on Computational Science.

[6]  A. Rappé,et al.  Molecular Mechanics Across Chemistry , 1997 .

[7]  Bjørn K. Alsberg,et al.  Data Analysis of Microarrays Using SciCraft , 2004, KELSI.

[8]  K. Fukui The path of chemical reactions - the IRC approach , 1981 .

[9]  R. Stallman Gnu General Public License and the Distribution of Derivative Works , 2003 .

[10]  O Engkvist,et al.  Accurate Intermolecular Potentials Obtained from Molecular Wave Functions: Bridging the Gap between Quantum Chemistry and Molecular Simulations. , 2000, Chemical reviews.

[11]  Richard A. Friesner,et al.  Mixed ab initio QM/MM modeling using frozen orbitals and tests with alanine dipeptide and tetrapeptide , 1999, J. Comput. Chem..

[12]  M. Forina,et al.  Multivariate calibration. , 2007, Journal of chromatography. A.

[13]  Werner Dubitzky,et al.  Knowledge Exploration in Life Science Informatics , 2004, Lecture Notes in Computer Science.

[14]  Paweł Sałek,et al.  Dalton, a molecular electronic structure program , 2005 .

[15]  David M. Beazley,et al.  Python Essential Reference , 1999 .

[16]  M. Karelson Molecular descriptors in QSAR/QSPR , 2000 .

[17]  H. Bernhard Schlegel,et al.  Improved algorithms for reaction path following: Higher‐order implicit algorithms , 1991 .

[18]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[19]  Use of multivariate methods in the analysis of calculated reaction pathways , 1996 .

[20]  B. Roos,et al.  Molcas: a program package for computational chemistry. , 2003 .

[21]  Mark Gahegan,et al.  GeoVISTA studio: a codeless visual programming environment for geoscientific data analysis and visualization , 2002 .

[22]  A. Leach Molecular Modelling: Principles and Applications , 1996 .

[23]  T. Dunning,et al.  Electron affinities of the first‐row atoms revisited. Systematic basis sets and wave functions , 1992 .

[24]  Neil Matthew,et al.  Beginning Databases with PostgreSQL , 2001 .

[25]  A. Roche,et al.  Organic Chemistry: , 1982, Nature.

[26]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[27]  Mark S. Gordon,et al.  General atomic and molecular electronic structure system , 1993, J. Comput. Chem..

[28]  Diomidis Spinellis Unix tools as visual programming components in a GUI‐builder environment , 2002, Softw. Pract. Exp..

[29]  B. Roos The Complete Active Space Self‐Consistent Field Method and its Applications in Electronic Structure Calculations , 2007 .