Towards efficient data exchange and sharing for big-data driven materials science: metadata and data formats

With big-data driven materials research, the new paradigm of materials science, sharing and wide accessibility of data are becoming crucial aspects. Obviously, a prerequisite for data exchange and big-data analytics is standardization, which means using consistent and unique conventions for, e.g., units, zero base lines, and file formats. There are two main strategies to achieve this goal. One accepts the heterogeneous nature of the community, which comprises scientists from physics, chemistry, bio-physics, and materials science, by complying with the diverse ecosystem of computer codes and thus develops “converters” for the input and output files of all important codes. These converters then translate the data of each code into a standardized, code-independent format. The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs, outputs, and restart files, directly into the same code-independent format. In this perspective paper, we present both strategies and argue that they can and should be regarded as complementary, if not even synergetic. The represented appropriate format and conventions were agreed upon by two teams, the Electronic Structure Library (ESL) of the European Center for Atomic and Molecular Computations (CECAM) and the NOvel MAterials Discovery (NOMAD) Laboratory, a European Centre of Excellence (CoE). A key element of this work is the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.

[1]  Anubhav Jain,et al.  A high-throughput infrastructure for density functional theory calculations , 2011 .

[2]  Chris-Kriton Skylaris,et al.  A benchmark for materials simulation , 2016, Science.

[3]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[4]  Donald J. Siegel,et al.  High capacity hydrogen storage materials: attributes for automotive applications and techniques for materials discovery. , 2010, Chemical Society reviews.

[5]  Stefano Curtarolo,et al.  High-throughput electronic band structure calculations: Challenges and tools , 2010, 1004.2974.

[6]  James P. Sethna,et al.  The potential of atomistic simulations and the knowledgebase of interatomic models , 2011 .

[7]  Xavier Gonze,et al.  Specification of an extensible and portable file format for electronic structure and crystallographic data , 2008, ArXiv.

[8]  Van de Walle CG,et al.  First-principles calculations of hyperfine parameters. , 1993, Physical review. B, Condensed matter.

[9]  Krishna Rajan,et al.  Combinatorial and high-throughput screening of materials libraries: review of state of the art. , 2011, ACS combinatorial science.

[10]  Chao Yang,et al.  GW100: Benchmarking G0W0 for Molecular Systems. , 2015, Journal of chemical theory and computation.

[11]  Stefano de Gironcoli,et al.  Reproducibility in density functional theory calculations of solids , 2016, Science.

[12]  Thomas Olsen,et al.  Computational screening of perovskite metal oxides for optimal solar light capture , 2012 .

[13]  V. Van Speybroeck,et al.  Error Estimates for Solid-State Density-Functional Theory Predictions: An Overview by Means of the Ground-State Elemental Crystals , 2012, 1204.2733.

[14]  Byoungwoo Kang,et al.  Battery materials for ultrafast charging and discharging , 2009, Nature.

[15]  Nicola Marzari,et al.  Dynamics and thermodynamics of a novel phase of NaAlH4. , 2009, Physical review letters.