Active Learning and Mapping: A Survey and Conception of a New Stochastic Methodology for High Throughput Materials Discovery

The data mining technology increasingly employed into new industrial processes, which require automatic analysis of data and related results in order to quickly proceed to conclusions. However, for some applications, an absolute automation may not be appropriate. Unlike traditional data mining, contexts deal with voluminous amounts of data, some domains are actually characterized by a scarcity of data, owing to the cost and time involved in conducting simulations or setting up experimental apparatus for data collection. In such domains, it is hence prudent to balance speed through automation and the utility of the generated data. The authors review the active learning methodology, and a new one that aims at generating successively new samples in order to reach an improved final estimation of the entire search space investigated according to the knowledge accumulated iteratively through samples selection and corresponding obtained results, is presented. The methodology is shown to be of great interest for applications such as high throughput material science and especially heterogeneous catalysis where the chemists do not have previous knowledge allowing to direct and to guide the exploration. DOI: 10.4018/978-1-4666-2455-9.ch004

[1]  Charalampos E. Tsourakakis Large Scale Graph Mining with MapReduce: Diameter Estimation and Eccentricity Plots of Massive Graphs with Mining Applications , 2012, SNA-KDD 2012.

[2]  Pierre Collet,et al.  Examination of genetic programming paradigm for high-throughput experimentation and heterogeneous catalysis , 2009 .

[3]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[4]  Arne Karlsson,et al.  Combinatorial Approach to the Hydrothermal Synthesis of Zeolites. , 1998, Angewandte Chemie.

[5]  Manfred Baerns,et al.  An evolutionary approach in the combinatorial selection and optimization of catalytic materials , 2000 .

[6]  Jandeleit,et al.  Combinatorial Materials Science and Catalysis. , 1999, Angewandte Chemie.

[7]  Evgeny N Vulfson,et al.  Template-Mediated Synthesis of a Polymeric Receptor Specific to Amino Acid Sequences. , 1999, Angewandte Chemie.

[8]  Irma Data Mining: Concepts, Methodologies, Tools, and Applications , 2013 .

[9]  Pedro Serna,et al.  Combining high-throughput experimentation, advanced data modeling and fundamental knowledge to develop catalysts for the epoxidation of large olefins and fatty esters , 2008 .

[10]  José M. Serra,et al.  Development of a low temperature light paraffin isomerization catalysts with improved resistance to water and sulphur by combinatorial methods , 2003 .

[11]  Manuel Moliner,et al.  Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies , 2007 .

[12]  Jennifer S. Holmgren,et al.  Strategies and applications of combinatorial methods and high throughput screening to the discovery of non-noble metal catalyst , 2004 .

[13]  Pedro Serna,et al.  Integrating chemists preferences for shape-similarity clustering of series. , 2008, Combinatorial chemistry & high throughput screening.

[14]  Simon M. Lucas,et al.  Parallel Problem Solving from Nature - PPSN X, 10th International Conference Dortmund, Germany, September 13-17, 2008, Proceedings , 2008, PPSN.

[15]  Manfred Baerns,et al.  Fundamental and combinatorial approaches in the search for and optimisation of catalytic materials for the oxidative dehydrogenation of propane to propene , 2001 .

[16]  Kohji Omata,et al.  60 Simple GA program developed for optimization of methanol and dimethyl ether synthesis , 2003 .

[17]  Vince Murphy,et al.  A fully integrated high-throughput screening methodology for the discovery of new polyolefin catalysts: discovery of a new class of high temperature single-site group (IV) copolymerization catalysts. , 2003, Journal of the American Chemical Society.

[18]  Laurent A Baumes,et al.  MAP: an iterative experimental design methodology for the optimization of catalytic search space structure modeling. , 2006, Journal of combinatorial chemistry.

[19]  Venkat Venkatasubramanian,et al.  Catalyst design: knowledge extraction from high-throughput experimentation , 2003 .

[20]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[21]  James Norman Cawse,et al.  Experimental Design for Combinatorial and High Throughput Materials Development , 2002 .

[22]  Manuel Moliner,et al.  Design of a full-profile-matching solution for high-throughput analysis of multiphase samples through powder X-ray diffraction. , 2009, Chemistry.

[23]  Claude Mirodatos,et al.  Data Management for Combinatorial Heterogeneous Catalysis: Methodology and Development of Advanced Tools , 2003 .

[24]  Jose Manuel Serra,et al.  Zeolite synthesis modelling with support vector machines: a combinatorial approach. , 2007, Combinatorial chemistry & high throughput screening.

[25]  Philippe Clauss,et al.  Efficient Parallel Implementation of Evolutionary Algorithms on GPGPU Cards , 2009, Euro-Par.

[26]  Selim Senkan,et al.  Combinatorial Heterogeneous Catalysis-A New Path in an Old Field. , 2001, Angewandte Chemie.

[27]  George Tzanis,et al.  Mining for Mutually Exclusive Items in Transaction Databases , 2007, Int. J. Data Warehous. Min..

[28]  Shahram Arbab,et al.  Exploring the Thermodynamic Aspects of Structure Formation During Wet-Spinning of Polyacrylonitrile Fibres , 2011, Int. J. Chemoinformatics Chem. Eng..

[29]  Nicolas Lachiche,et al.  Using Genetic Programming for an Advanced Performance Assessment of Industrially Relevant Heterogeneous Catalysts , 2009 .

[30]  Pedro Serna,et al.  Merging traditional and high-throughput approaches results in efficient design, synthesis and screening of catalysts for an industrial process , 2010 .

[31]  Manuel Moliner,et al.  A reliable methodology for high throughput identification of a mixture of crystallographic phases from powder X-ray diffraction data , 2008 .

[32]  Kee-Sun Sohn,et al.  Search for Long Phosphorescence Materials by Combinatorial Chemistry Method , 2001 .

[33]  L. Harmon,et al.  Experiment planning for combinatorial materials discovery , 2003 .

[34]  Claude Mirodatos,et al.  Using Artificial Neural Networks to Boost High‐throughput Discovery in Heterogeneous Catalysis , 2004 .

[35]  José M. Serra,et al.  Styrene from toluene by combinatorial catalysis , 2003 .

[36]  Tzung-Pei Hong,et al.  Social Network Mining, Analysis and Research Trends: Techniques and Applications , 2011 .

[37]  Claude Mirodatos,et al.  Design of Discovery Libraries for Solids Based on QSAR Models , 2005 .

[38]  J. Hanak,et al.  A quantum leap in the development of new materials and devices , 2004 .

[39]  M. Baerns,et al.  Application of a genetic algorithm and a neural network for the discovery and optimization of new solid catalytic materials , 2004 .

[40]  Ina Fourie Social and Political Implications of Data Mining: Knowledge Management in E‐Government , 2010 .

[41]  Manfred Baerns,et al.  Fundamental insights into the oxidative dehydrogenation of ethane to ethylene over catalytic materials discovered by an evolutionary approach , 2003 .

[42]  Kohji Omata,et al.  Optimization of Cu oxide catalysts for methanol synthesis by combinatorial tools using 96 well microplates, artificial neural network and genetic algorithm , 2004 .

[43]  A. K. Haghi Methodologies and Applications for Chemoinformatics and Chemical Engineering , 2013 .

[44]  José M. Serra,et al.  A New Mapping/Exploration Approach for HT Synthesis of Zeolites , 2006 .

[45]  Hoffmann,et al.  Parallel Synthesis and Testing of Catalysts under Nearly Conventional Testing Conditions. , 1999, Angewandte Chemie.

[46]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[47]  Claude Mirodatos,et al.  How to Design Diverse Libraries of Solid Catalysts , 2003 .

[48]  P. Jacobs,et al.  Optimization of MoVSb oxide catalyst for partial oxidation of isobutane by combinatorial approaches. , 2005, Journal of combinatorial chemistry.

[49]  Huma Lodhi,et al.  Chemoinformatics and Advanced Machine Learning Perspectives: Complex Computational Methods and Collaborative Techniques , 2010 .

[50]  Martin Holena,et al.  Efficient Discovery of Nonlinear Dependencies in a Combinatorial Catalyst Data Set , 2004, J. Chem. Inf. Model..

[51]  L. Darrell Whitley,et al.  Evaluating Evolutionary Algorithms , 1996, Artif. Intell..

[52]  Thomas Maschmeyer,et al.  High-speed experimentation techniques applied to the study of the synthesis of zeolites and silsesquioxanes , 2002 .

[53]  Martin Holeňa,et al.  Feedforward neural networks in catalysis: A tool for the approximation of the dependency of yield on catalyst composition, and for knowledge extraction , 2003 .

[54]  H. Koinuma,et al.  Combinatorial solid-state chemistry of inorganic materials , 2004, Nature materials.

[55]  J. M. Serra,et al.  Support vector machines for predictive modeling in heterogeneous catalysis: a comprehensive introduction and overfitting investigation based on two real applications. , 2006, Journal of combinatorial chemistry.

[56]  Claude Mirodatos,et al.  The development of descriptors for solids: teaching "catalytic intuition" to a computer. , 2004, Angewandte Chemie.

[57]  Ramasamy Uthurusamy,et al.  Data mining and knowledge discovery in databases , 1996, CACM.