Generating information-rich high-throughput experimental materials genomes using functional clustering via multitree genetic programming and information theory.

High-throughput experimental methodologies are capable of synthesizing, screening and characterizing vast arrays of combinatorial material libraries at a very rapid rate. These methodologies strategically employ tiered screening wherein the number of compositions screened decreases as the complexity, and very often the scientific information obtained from a screening experiment, increases. The algorithm used for down-selection of samples from higher throughput screening experiment to a lower throughput screening experiment is vital in achieving information-rich experimental materials genomes. The fundamental science of material discovery lies in the establishment of composition-structure-property relationships, motivating the development of advanced down-selection algorithms which consider the information value of the selected compositions, as opposed to simply selecting the best performing compositions from a high throughput experiment. Identification of property fields (composition regions with distinct composition-property relationships) in high throughput data enables down-selection algorithms to employ advanced selection strategies, such as the selection of representative compositions from each field or selection of compositions that span the composition space of the highest performing field. Such strategies would greatly enhance the generation of data-driven discoveries. We introduce an informatics-based clustering of composition-property functional relationships using a combination of information theory and multitree genetic programming concepts for identification of property fields in a composition library. We demonstrate our approach using a complex synthetic composition-property map for a 5 at. % step ternary library consisting of four distinct property fields and finally explore the application of this methodology for capturing relationships between composition and catalytic activity for the oxygen evolution reaction for 5429 catalyst compositions in a (Ni-Fe-Co-Ce)Ox library.

[1]  Pablo A. Estévez,et al.  Genetic programming-based clustering using an information theoretic fitness measure , 2007, 2007 IEEE Congress on Evolutionary Computation.