Graphical Data Mining for Computational Estimation in Materials Science Applications

In domains such as Materials Science experimental results are often plotted as two-dimensional graphs of a dependent versus an independent variable to aid visual analysis. Performing laboratory experiments with specified input conditions and plotting such graphs consumes significant time and resources motivating the need for computational estimation. The goals are to estimate the graph obtained in an experiment given its input conditions, and to estimate the conditions needed to obtain a desired graph. State-ofthe-art estimation approaches are not found suitable for targeted applications. In this dissertation, an estimation approach called AutoDomainMine is proposed. In AutoDomainMine, graphs from existing experiments are clustered and decision tree classification is used to learn the conditions characterizing these clusters in order to build a representative pair of input conditions and graph per cluster. This forms knowledge discovered from existing experiments. Given the conditions of a new experiment, the relevant decision tree path is traced to estimate its cluster. The representative graph of that cluster is the estimated graph. Alternatively, given a desired

[1]  Ranjeet D Vader Development of Computer Aided Heat Treatment Planning System (CAHTPS) , 2002 .

[2]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[3]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[4]  Riccardo Ortale,et al.  Distance-based Clustering of XML Documents , 2003 .

[5]  Pearl Pu,et al.  Opportunistic Search with Semantic Fisheye Views , 2004, WISE.

[6]  Geoff Holmes,et al.  Racing Committees for Large Datasets , 2002, Discovery Science.

[7]  John Langford,et al.  An objective evaluation criterion for clustering , 2004, KDD.

[8]  Soumitra Dutta,et al.  Case-Based Reasoning Systems: From Automation to Decision-Aiding and Simulation , 1997, IEEE Trans. Knowl. Data Eng..

[9]  Carolina Ruiz,et al.  Integrating Clustering and Classification for Estimating Process Variables in Materials Science , 2006, AAAI.

[10]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[11]  Gary J. Koehler,et al.  Theory and practice of decision tree induction , 1995 .

[12]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[13]  R. Amen,et al.  Case-based reasoning as a tool for materials selection , 2001 .

[14]  Lei Chen,et al.  Similarity-based Retrieval of Time-Series Data Using Multi-Scale Histograms , 2003 .

[15]  James D. Hollan,et al.  Image representations for accessing and organizing Web information , 2000, IS&T/SPIE Electronic Imaging.

[16]  Heikki Mannila,et al.  Context-Based Similarity Measures for Categorical Databases , 2000, PKDD.

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Balgobin Nandram,et al.  Applied Statistics for Engineers and Scientists , 2005 .

[19]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[20]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[21]  Daniel A. Keim,et al.  Similarity search in multimedia databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Howard E. Boyer,et al.  Quenching and Control of Distortion , 1988 .

[23]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[24]  Matthew O. Ward,et al.  QuenchMinerTM: Decision Support for Optimization of Heat Treating Processes , 2003, IICAI.

[25]  Kamalendu Pal,et al.  An application of rule-based and case-based reasoning within a single legal knowledge-based system , 1997, DATB.

[26]  Carolina Ruiz,et al.  Learning semantics-preserving distance metrics for clustering graphical data , 2005, MDM '05.

[27]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[28]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[29]  William D. Callister,et al.  Materials Science and Engineering: An Introduction , 1985 .

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[32]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[33]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[34]  J. Rissanen Stochastic complexity and the mdl principle , 1987 .

[35]  David C. Wilson,et al.  Learning to Improve Case Adaption by Introspective Reasoning and CBR , 1995, ICCBR.

[36]  Edward E. Smith,et al.  Categories and concepts , 1984 .

[37]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[38]  Richard M. Friedberg,et al.  A Learning Machine: Part I , 1958, IBM J. Res. Dev..

[39]  Wei Tang,et al.  Ensembling neural networks: Many could be better than all , 2002, Artif. Intell..

[40]  Jinwu Kang,et al.  Numerical simulation of heat transfer in loaded heat treatment furnaces , 2004 .

[41]  George E. Totten,et al.  Handbook of quenchants and quenching technology , 1993 .

[42]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[43]  Carolina Ruiz,et al.  Designing semantics-preserving cluster representatives for scientific input conditions , 2006, CIKM '06.

[44]  Christos Faloutsos,et al.  Tri-plots: scalable tools for multidimensional data mining , 2001, KDD '01.

[45]  James E. Gentle,et al.  Finding Groups in Data: An Introduction to Cluster Analysis. , 1991 .

[46]  G. H. Geiger,et al.  Transport Phenomena in Materials Processing , 1998 .