Entropy-based associative classification algorithm for mining manufacturing data

This paper presents a new associative classification algorithm for data mining. The algorithm uses elementary set concepts, information entropy and database manipulation techniques to develop useful relationships between input and output attributes of large databases. These relationships (knowledge) are represented using IF–THEN association rules, where the IF portion of the rule includes a set of input attributes features and THEN portion of the rule includes a set of output attributes that represent decision outcome. Application of the algorithm is presented with a thermal spray process control case study. Thermal spray is a process of forming a desired shape of material by spraying melted metal on a ceramic mould. The goal of the study is to identify spray process input parameters that can be used to effectively control the process with the purpose of obtaining better characteristics for the sprayed material. Detailed discussion on the source and characteristics of the data sets is also presented.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Xiaowen Wang,et al.  Development of Empirical Models for Surface Roughness Prediction in Finish Turning , 2002 .

[3]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[4]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[5]  Jiuyong Li Robust rule-based prediction , 2006, IEEE Transactions on Knowledge and Data Engineering.

[6]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[7]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[8]  Ingoo Han,et al.  The Extraction of Trading Rules From Stock Market Data Using Rough Sets , 2001, Expert Syst. J. Knowl. Eng..

[9]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[10]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[11]  Andrew Kusiak,et al.  Autonomous decision-making: a data mining approach , 2000, IEEE Transactions on Information Technology in Biomedicine.

[12]  Chandrika Kamath,et al.  Inducing oblique decision trees with evolutionary algorithms , 2003, IEEE Trans. Evol. Comput..

[13]  David W. Aha,et al.  Tolerating Noisy, Irrelevant and Novel Attributes in Instance-Based Learning Algorithms , 1992, Int. J. Man Mach. Stud..

[14]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[17]  K. Hollis,et al.  Particle temperature and flux measurement utilizing a nonthermal signal correction process , 1998 .

[18]  D. Basak,et al.  Calibration of a two-color imaging pyrometer and its use for particle measurements in controlled air plasma spray experiments , 2002 .

[19]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[20]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[21]  Yixin Chen,et al.  Support vector learning for fuzzy rule-based classification systems , 2003, IEEE Trans. Fuzzy Syst..

[22]  Chang-Xue Feng,et al.  Surface roughness predictive modeling: neural networks versus regression , 2003 .

[23]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[25]  James R. Fincke,et al.  Diagnostics and control in the thermal spray process , 2001 .

[26]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[27]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[30]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[31]  Andrew Kusiak,et al.  Data mining of printed-circuit board defects , 2001, IEEE Trans. Robotics Autom..

[32]  Lukasz A. Kurgan,et al.  A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation , 2008, Data Knowl. Eng..

[33]  Marjorie B. Platt,et al.  Probabilistic Neural Networks in Bankruptcy Prediction , 1999 .

[34]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[35]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36]  E. Hämäläinen,et al.  Novel method for in-flight particle temperature and velocity measurements in plasma spraying using a single CCD camera , 2001 .

[37]  Andrew Kusiak,et al.  Decomposition in data mining: an industrial case study , 2000 .

[38]  Brian D. Ripley,et al.  Clinical applications of artificial neural networks: Neural networks as statistical methods in survival analysis , 2001 .

[39]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[40]  G. Settles,et al.  Independent control of HVOF particle velocity and temperature , 2002 .

[41]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[42]  Andrew Kusiak,et al.  Rough set theory: a data mining tool for semiconductor manufacturing , 2001 .

[43]  Cecylia Rauszer Reducts in information systems , 1991, Fundam. Informaticae.

[44]  Lucila Ohno-Machado,et al.  Building manageable rough set classifiers , 1998, AMIA.

[45]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[46]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[47]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[48]  Salvatore Ruggieri,et al.  Efficient C4.5 , 2002, IEEE Trans. Knowl. Data Eng..

[49]  Jerzy W. Grzymala-Busse,et al.  A New Version of the Rule Induction System LERS , 1997, Fundam. Informaticae.

[50]  Frans Coenen,et al.  The effect of threshold values on association rule based classification accuracy , 2007, Data Knowl. Eng..

[51]  Jennifer Widom,et al.  Database Systems: The Complete Book , 2001 .

[52]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[53]  B. Cetegen,et al.  In-situ particle temperature, velocity, and size measurements in DC Arc plasma thermal sprays , 1999 .

[54]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[55]  Andrew Kusiak,et al.  The G-algorithm for extraction of robust decision rules - children's postoperative intra-atrial arrhythmia case study , 2001, IEEE Transactions on Information Technology in Biomedicine.

[56]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[57]  Salvatore Greco,et al.  Multi-criteria classification - A new scheme for application of dominance-based decision rules , 2007, Eur. J. Oper. Res..

[58]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[59]  Z. Pawlak Rough set approach to knowledge-based decision support , 1997 .