Towards Simulation-Data Science – A Case Study on Material Failures

Simulations let scientists study properties of complex systems. At first sight, data mining is a good choice when evaluating large numbers of simulations. But it is currently unclear whether there are general principles that might guide the deployment of respective methods to simulation data. In other words, is it worthwhile to target at simulation-data science as a distinct subdiscipline of data science? To identify a respective research agenda and to structure the research questions, we conduct a case study from the domain of materials science. One insight that simulation data may be different from other data regarding its structure and quality, which entails focal points different from the ones of conventional data-analysis projects. It also turns out that interpretability and usability are important notions in our context as well. More attention is needed to gather the various meanings of these terms to align them with the needs and priorities of domain scientists. Finally, we propose extensions to our case study which we deem necessary to generalize our insights towards the guidelines envisioned for simulation-data science.

[1]  R. Ramprasad,et al.  Machine Learning in Materials Science , 2016 .

[2]  A. Choudhary,et al.  Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science , 2016 .

[3]  Parijat Deshpande,et al.  Exploration of data science techniques to predict fatigue strength of steel from composition and processing parameters , 2014, Integrating Materials and Manufacturing Innovation.

[4]  James M. Rondinelli,et al.  Theory-Guided Machine Learning in Materials Science , 2016, Front. Mater..

[5]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[6]  Benno Stein,et al.  Simulation Data Mining for Supporting Bridge Design , 2011, AusDM.

[7]  Thomas J. R. Hughes,et al.  An isogeometric approach to cohesive zone modeling , 2011 .

[8]  J. Vybíral,et al.  Big data of materials science: critical role of the descriptor. , 2014, Physical review letters.

[9]  Nagiza F. Samatova,et al.  Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Thomas F. Brady,et al.  Simulation data mining: a new form of computer simulation output , 2005, Proceedings of the Winter Simulation Conference, 2005..

[11]  Vincent Hakim,et al.  Laws of crack motion and phase-field models of fracture , 2008, 0806.0593.

[12]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Krishna Rajan,et al.  Materials Informatics: The Materials ``Gene'' and Big Data , 2015 .

[15]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[16]  Bipan Tudu,et al.  Preemptive identification of optimum fermentation time for black tea using electronic nose , 2008 .

[17]  James Theiler,et al.  A Perspective on Materials Informatics: State-of-the-Art and Challenges , 2016 .

[18]  Surya R. Kalidindi,et al.  Materials Data Science: Current Status and Future Outlook , 2015 .

[19]  Henri Pierreval,et al.  Rule-based simulation metamodels , 1992 .

[20]  Martin Oliver Steinhauser,et al.  Computational Multiscale Modeling of Fluids and Solids , 2022, Graduate Texts in Physics.

[21]  Yue Liu,et al.  Materials discovery and design using machine learning , 2017 .

[22]  S. Ong,et al.  New opportunities for materials informatics: Resources and data mining techniques for uncovering hidden relationships , 2016 .

[23]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[24]  Krishna Rajan,et al.  Information Science for Materials Discovery and Design , 2016 .

[25]  Michael R. Berthold,et al.  Simulation Data Analysis Using Fuzzy Graphs , 1997, IDA.

[26]  Timothy W. Simpson,et al.  Metamodels for Computer-based Engineering Design: Survey and recommendations , 2001, Engineering with Computers.

[27]  Dimiter Dobrev,et al.  Computer Simulation , 1966, J. Inf. Process. Cybern..

[28]  Chiho Kim,et al.  Machine learning in materials informatics: recent applications and prospects , 2017, npj Computational Materials.

[29]  Tim Mueller,et al.  Machine Learning in Materials Science , 2016 .

[30]  G. Gary Wang,et al.  Review of Metamodeling Techniques in Support of Engineering Design Optimization , 2007, DAC 2006.

[31]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[32]  Kurt Kremer Computer Simulations in Soft Matter Science , 2000 .

[33]  Liang Guo,et al.  A recurrent neural network based health indicator for remaining useful life prediction of bearings , 2017, Neurocomputing.

[34]  B. Meredig,et al.  Materials science with large-scale data and informatics: Unlocking new opportunities , 2016 .

[35]  Ted Belytschko,et al.  Elastic crack growth in finite elements with minimal remeshing , 1999 .

[36]  G. Watson,et al.  Computer simulation , 1988 .

[37]  Christian Miehe,et al.  Thermodynamically consistent phase‐field models of fracture: Variational principles and multi‐field FE implementations , 2010 .

[38]  Ted Belytschko,et al.  A finite element method for crack growth without remeshing , 1999 .

[39]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.