Materials Informatics: The Materials ``Gene'' and Big Data

Materials informatics provides the foundations for a new paradigm of materials discovery. It shifts our emphasis from one of solely searching among large volumes of data that may be generated by experiment or computation to one of targeted materials discovery via high-throughput identification of the key factors (i.e., “genes”) and via showing how these factors can be quantitatively integrated by statistical learning methods into design rules (i.e., “gene sequencing”) governing targeted materials functionality. However, a critical challenge in discovering these materials genes is the difficulty in unraveling the complexity of the data associated with numerous factors including noise, uncertainty, and the complex diversity of data that one needs to consider (i.e., Big Data). In this article, we explore one aspect of materials informatics, namely how one can efficiently explore for new knowledge in regimes of structure-property space, especially when no reasonable selection pathways based on theory or clear...

[1]  Atsuto Seko,et al.  Sparse representation for a potential energy surface , 2014, 1403.7995.

[2]  Piero P. Bonissone,et al.  On heuristics as a fundamental constituent of soft computing , 2008, Fuzzy Sets Syst..

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  K. Rajan,et al.  Rational design of binary halide scintillators via data mining , 2012 .

[5]  Lakhmi C. Jain,et al.  Evolutionary computation in data mining , 2005 .

[6]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[7]  R. Yager Fuzzy decision making including unequal objectives , 1978 .

[8]  Krishna Rajan,et al.  Data mining of Ti-Al semi-empirical parameters for developing reduced order models , 2011 .

[9]  Klaus-Robert Müller,et al.  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies. , 2013, Journal of chemical theory and computation.

[10]  Wei Luo,et al.  Information-Theoretic Approach for the Discovery of Design Rules for Crystal Chemistry , 2012, J. Chem. Inf. Model..

[11]  Somnath Datta,et al.  Informatics-aided bandgap engineering for solar materials , 2014 .

[12]  Desire L. Massart,et al.  Rough sets theory , 1999 .

[13]  Krishna Rajan,et al.  Combinatorial Materials Sciences: Experimental Strategies for Accelerated Knowledge Discovery , 2008 .

[14]  Krishna Rajan,et al.  Combinatorial and high-throughput screening of materials libraries: review of state of the art. , 2011, ACS combinatorial science.

[15]  Philippa A.S. Reed,et al.  An example of the use of neural computing techniques in materials science : the modelling of fatigue thresholds in Ni-base superalloys , 1999 .

[16]  Krishna Rajan,et al.  Statistically based assessment of formation enthalpy for intermetallic compounds , 2014 .

[17]  Krishna Rajan Nanoinformatics: Data-Driven Materials Design for Health and Environmental Needs , 2014 .

[18]  Z. Pawlak Rough set approach to knowledge-based decision support , 1997 .

[19]  Noboru Murata,et al.  Model Selection and Information Criterion , 2009 .

[20]  Rajkumar Roy,et al.  Development of a soft computing-based framework for engineering design optimisation with quantitative and qualitative search spaces , 2007, Appl. Soft Comput..

[21]  Krishna Rajan,et al.  Linking length scales via materials informatics , 2006 .

[22]  Krishna Rajan,et al.  Classification of oxide compounds through data-mining density of states spectra , 2011 .

[23]  Wu Deng,et al.  Fatigue behaviors prediction method of welded joints based on soft computing methods , 2013 .

[24]  Nirupam Chakraborti,et al.  Identification and Optimization of AB2 Phases Using Principal Component Analysis, Evolutionary Neural Nets, and Multiobjective Genetic Algorithms , 2009 .

[25]  S. Broderick,et al.  Informatics-Based Uncertainty Quantification in the Design of Inorganic Scintillators , 2013 .

[26]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[27]  Santosh Kumar Das,et al.  On Soft Computing Techniques in Various Areas , 2013 .

[28]  Nirupam Chakraborti,et al.  Analyzing Sparse Data for Nitride Spinels Using Data Mining, Neural Networks, and Multiobjective Genetic Algorithms , 2008 .

[29]  Krishna Rajan,et al.  Mapping the 'materials gene' for binary intermetallic compounds—a visualization schema for crystallographic databases , 2012 .

[30]  Chen Wei-min Rough Set Theory and Granular Computing , 2006 .

[31]  Michal Wozniak,et al.  Soft computing methods applied to combination of one-class classifiers , 2012, Neurocomputing.

[32]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[33]  Krishna Rajan,et al.  Identifying the ‘inorganic gene’ for high-temperature piezoelectric perovskites through statistical learning , 2011, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[34]  Krishna Rajan,et al.  Data-Driven Model for Estimation of Friction Coefficient Via Informatics Methods , 2012, Tribology Letters.

[35]  Krishna Rajan,et al.  Informatics for Materials Science and Engineering: Data-Driven Discovery for Accelerated Experimentation and Application , 2013 .

[36]  Alán Aspuru-Guzik,et al.  The Harvard Clean Energy Project: Large-Scale Computational Screening and Design of Organic Photovoltaics on the World Community Grid , 2011 .

[37]  Lotfi A. Zadeh,et al.  Soft computing and fuzzy logic , 1994, IEEE Software.

[38]  Dilip Kumar Pratihar,et al.  Modeling of input-output relationships for a plasma spray coating process using soft computing tools , 2012, Appl. Soft Comput..

[39]  Krishna Rajan,et al.  Mining for elastic constants of intermetallics from the charge density landscape , 2015 .

[40]  D. Wales The energy landscape as a unifying theme in molecular science , 2005, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[41]  Andrzej Skowron,et al.  Rudiments of rough sets , 2007, Inf. Sci..

[42]  Somnath Datta,et al.  Informatics guided discovery of surface structure-chemistry relationships in catalytic nanoparticles. , 2014, The Journal of chemical physics.

[43]  Anubhav Jain,et al.  New Light‐Harvesting Materials Using Accurate and Efficient Bandgap Calculations , 2015 .

[44]  J. Nørskov,et al.  CatApp: a web application for surface chemistry and heterogeneous catalysis. , 2012, Angewandte Chemie.

[45]  Krishna Rajan,et al.  Informatics derived materials databases for multifunctional properties , 2015, Science and technology of advanced materials.

[46]  A. J. Dentsoras,et al.  Soft computing in engineering design - A review , 2008, Adv. Eng. Informatics.

[47]  Marco Buongiorno Nardelli,et al.  The high-throughput highway to computational materials design. , 2013, Nature materials.

[48]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[49]  Erkan Topal,et al.  A review of soft computing technology applications in several mining problems , 2014, Appl. Soft Comput..

[50]  Frédéric Clerc,et al.  Virtual screening of materials using neuro-genetic approach : Concepts and implementation , 2009 .

[51]  Julyan H E Cartwright,et al.  Beyond crystals: the dialectic of materials and information , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[52]  Petra Perner,et al.  Mining Sparse and Big Data by Case-based Reasoning , 2014, KES.

[53]  Lotfi A. Zadeh,et al.  Fuzzy logic, neural networks, and soft computing , 1993, CACM.

[54]  W. Ziarko,et al.  Rough sets applied to materials data , 1996 .

[55]  Zdzisław Pawlak,et al.  Rough sets applied to the discovery of materials knowledge , 1998 .