Materials Data Science: Current Status and Future Outlook

The field of materials science and engineering is on the cusp of a digital data revolution. After reviewing the nature of data science and Big Data, we discuss the features of materials data that distinguish them from data in other fields. We introduce the concept of process-structure-property (PSP) linkages and illustrate how the determination of PSPs is one of the main objectives of materials data science. Then we review a selection of materials databases, as well as important aspects of materials data management, such as storage hardware, archiving strategies, and data access strategies. We introduce the emerging field of materials data analytics, which focuses on data-driven approaches to extract and curate materials knowledge from available data sets. The critical need for materials e-collaboration platforms is highlighted, and we conclude the article with a number of suggestions regarding the near-term future of the materials data science field.

[1]  A. Mita Community Owned digital Preservation Tool Registry (COPTR) , 2016 .

[2]  Surya R. Kalidindi,et al.  Data science and cyberinfrastructure: critical enablers for accelerated development of hierarchical materials , 2015 .

[3]  Surya R. Kalidindi,et al.  Calibrated localization relationships for elastic response of polycrystalline aggregates , 2014 .

[4]  William S. Cleveland,et al.  Data science: An action plan for expanding the technical areas of the field of statistics , 2001, Stat. Anal. Data Min..

[5]  Philip H. Carns,et al.  Efficient I/O and Storage of Adaptive-Resolution Data , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  Surya R. Kalidindi,et al.  Workflow for integrating mesoscale heterogeneities in materials structure with process simulation of titanium alloys , 2014, Integrating Materials and Manufacturing Innovation.

[7]  M. Groeber,et al.  DREAM.3D: A Digital Representation Environment for the Analysis of Microstructure in 3D , 2014, Integrating Materials and Manufacturing Innovation.

[8]  Marc De Graef,et al.  h5ebsd: an archival data format for electron back-scatter diffraction data sets , 2014, Integrating Materials and Manufacturing Innovation.

[9]  Yang Li,et al.  Stalking the Materials Genome: A Data‐Driven Approach to the Virtual Design of Nanostructured Polymers , 2013, Advanced functional materials.

[10]  Youjie Zhou,et al.  3D Materials Image Segmentation by 2D Propagation: A Graph-Cut Approach Considering Homomorphism , 2013, IEEE Transactions on Image Processing.

[11]  O. Auciello The materials research community studies magnitude of “Big Data” , 2013 .

[12]  Ashley A. White Big data are shaping the future of materials science , 2013 .

[13]  S. Kalidindi,et al.  Novel microstructure quantification framework for databasing, visualization, and analysis of microstructure data , 2013, Integrating Materials and Manufacturing Innovation.

[14]  C. Titus Brown,et al.  khmer: Working with Big Data in Bioinformatics , 2013, ArXiv.

[15]  Surya R. Kalidindi,et al.  Computationally Efficient, Fully Coupled Multiscale Modeling of Materials Phenomena Using Calibrated Localization Linkages , 2012 .

[16]  David T. Fullwood,et al.  Microstructure Sensitive Design for Performance Optimization , 2012 .

[17]  John Rumble,et al.  A Perspective on Materials Databases , 2012, Data Sci. J..

[18]  S. Kalidindi,et al.  Estimating the response of polycrystalline materials using sets of weighted statistical volume elements , 2012 .

[19]  Charles H. Ward Materials Genome Initiative for Global Competitiveness , 2012 .

[20]  Alexander S. Szalay,et al.  RESEARCH ARTICLE Studying Lagrangian dynamics of turbulence using on-demand fluid particle tracking in a public turbulence database , 2012 .

[21]  S. Pennycook,et al.  Handbook of Nanoscopy: TENDELOO:HBK NANOSCOPY 2V O-BK , 2012 .

[22]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[23]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[24]  A. Szalay,et al.  Studying Lagrangian dynamics of turbulence using on-demand fluid particle tracking in the JHU turbulence database , 2011 .

[25]  Charles A. Bouman,et al.  Bayesian methods for image segmentation , 2011 .

[26]  M. Miller,et al.  High-energy diffraction microscopy at the advanced photon source , 2011 .

[27]  Surya R. Kalidindi,et al.  Formulation and calibration of higher-order elastic localization relationships using the MKS approach , 2011 .

[28]  Surya R. Kalidindi,et al.  Microstructure informatics using higher-order statistics and efficient data-mining protocols , 2011 .

[29]  D. Fullwood,et al.  Optimized structure based representative volume element sets reflecting the ensemble-averaged 2-point statistics , 2010 .

[30]  Michael A. Jackson,et al.  MXA: a customizable HDF5-based data format for multi-dimensional data sets , 2010 .

[31]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[32]  Jodi Forlizzi,et al.  A stage-based model of personal informatics systems , 2010, CHI.

[33]  Surya R. Kalidindi,et al.  Multi-scale modeling of elastic response of three-dimensional voxel-based microstructure datasets using novel DFT-based knowledge systems , 2010 .

[34]  S. Kalidindi,et al.  Applications of the Phase-Coded Generalized Hough Transform to Feature Detection, Analysis, and Segmentation of Digital Microstructures , 2009 .

[35]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[36]  G. Rutledge,et al.  NOAA National Operational Model Archive and Distribution System (NOMADS) Access to the Twentieth Century Reanalysis Project , 2009 .

[37]  Krishna Rajan,et al.  Application-Driven Data Analysis , 2009 .

[38]  Sean Ekins,et al.  Novel web-based tools combining chemistry informatics, biology and social networks for drug discovery. , 2009, Drug discovery today.

[39]  M. Graef,et al.  Application and further development of advanced image processing algorithms for automated analysis of serial section image data , 2009 .

[40]  D. Fullwood,et al.  Gradient-based microstructure reconstructions from distributions using fast Fourier transforms , 2008 .

[41]  D. Fullwood,et al.  Delineation of the space of 2-point correlations in a composite material system , 2008 .

[42]  M. Graef,et al.  On the use of moment invariants for the automated analysis of 3D particle shapes , 2008 .

[43]  D. Fullwood,et al.  Microstructure reconstructions from 2-point statistics using phase-recovery algorithms , 2008 .

[44]  M. Graef,et al.  On the use of 2-D moment invariants for the automated classification of particle shapes , 2008 .

[45]  Yi Li,et al.  A public turbulence database cluster and applications to study Lagrangian evolution of velocity increments in turbulence , 2008, 0804.1703.

[46]  Deborah Mies,et al.  Managing Materials Data , 2007 .

[47]  David J. Srolovitz,et al.  The von Neumann relation generalized to coarsening of three-dimensional microstructures , 2007, Nature.

[48]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[49]  David Cebon,et al.  Engineering Materials Informatics , 2006 .

[50]  Thomas T. H. Wan,et al.  Healthcare Informatics Research: From Data to Evidence-Based Management , 2006, Journal of Medical Systems.

[51]  S. Kalidindi,et al.  Finite approximations to the second-order properties closure in single phase polycrystals , 2005 .

[52]  Ryszard Pyrz,et al.  Reconstruction of random microstructures––a stochastic optimization problem , 2004 .

[53]  GhemawatSanjay,et al.  The Google file system , 2003 .

[54]  Kristin A. Persson,et al.  Predicting crystal structures with data mining of quantum calculations. , 2003, Physical review letters.

[55]  Ohad Rodeh,et al.  zFS - a scalable distributed file system using object disks , 2003, 20th IEEE/11th NASA Goddard Conference on Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings..

[56]  James M. Tien,et al.  Toward a decision informatics paradigm: a real-time, information-based approach to decision making , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[57]  G. Milton The Theory of Composites , 2002 .

[58]  Peihua Qiu,et al.  Statistical Analysis of Microstructures in Materials Science , 2002, Technometrics.

[59]  R Lahana,et al.  Functional diversity of compound libraries. , 2000, Current opinion in chemical biology.

[60]  Frank Mücklich,et al.  Statistical Analysis of Microstructures in Materials Science , 2000 .

[61]  N. J. Zaluzec,et al.  The telepresence microscopy collaboratory. , 1998 .

[62]  Gerbrand Ceder,et al.  Predicting Properties from Scratch , 1998, Science.

[63]  A. Roberts Statistical reconstruction of three-dimensional porous media from two-dimensional images , 1997, cond-mat/9902023.

[64]  S. Pizer,et al.  The Image Processing Handbook , 1994 .

[65]  D. Butler,et al.  The Earth Observing System Data and Information System , 1991 .

[66]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[67]  E. Willen,et al.  Relativistic Heavy Ion Collider , 1986 .

[68]  E. Hall,et al.  The Deformation and Ageing of Mild Steel: III Discussion of Results , 1951 .

[69]  E. Hall,et al.  The Deformation and Ageing of Mild Steel , 1951 .

[70]  Gilbert Moïsio,et al.  Internet Engineering Task Force , 2014 .

[71]  Jitesh H. Panchal,et al.  Key computational modeling issues in Integrated Computational Materials Engineering , 2013, Comput. Aided Des..

[72]  S. Pennycook,et al.  Handbook of nanoscopy , 2012 .

[73]  Surya R. Kalidindi,et al.  Selection of representative volume elements for pore-scale analysis of transport in fuel cell materials , 2012 .

[74]  Surya R. Kalidindi,et al.  A new framework for computationally efficient structure–structure evolution linkages to facilitate high-fidelity scale bridging in multi-scale materials models , 2011 .

[75]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[76]  Susanne M. Opalka,et al.  Computer Coupling of Phase Diagrams and Thermochemistry , 2008 .

[77]  Ncbi National Center for Biotechnology Information , 2008 .

[78]  A. Argon,et al.  Strengthening Mechanisms in Crystal Plasticity , 2007 .

[79]  Jean Paul Frédéric Serra,et al.  A Lattice Approach to Image Segmentation , 2005, Journal of Mathematical Imaging and Vision.

[80]  M. Hütter,et al.  Crystal shapes and crystallization in continuum modeling , 2005 .

[81]  Two-Phase Grain Structures Standard Test Methods for Determining Average Grain Size Using Semiautomatic and Automatic Image Analysis 1 , 2004 .

[82]  A. Vanova,et al.  The National Digital Information Infrastructure and Preservation Program , 2004 .

[83]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[84]  S. Torquato Random Heterogeneous Materials , 2002 .

[85]  Thomas Roser,et al.  Relativistic Heavy Ion Collider , 2001 .

[86]  B. Briscoe Internet Engineering Task Force , 1995 .

[87]  Robert E. Reed-Hill,et al.  Physical Metallurgy Principles , 1972 .

[88]  N. Petch,et al.  The Cleavage Strength of Polycrystals , 1953 .