Classification of type-2 diabetic patients by using Apriori and predictive Apriori

In this study a new approach to generate association rules on numeric data is proposed. It has been observed that equal binning techniques are not always useful to convert numerical data into categorical data, specifically in medical data. The proposed approach utilise a modified equal width binning interval technique to discretise continuous valued attributes to nominal based on opinion taken from medical experts. Approximate width of the desired intervals is chosen based on the advice given by medical experts and is given as an input to the model. Apriori algorithm usually used for the market basket analysis is used to generate rules on Pima Indian diabetes data. The study compares the quality of different association rule mining approaches for classification. The proposed approach utilises standard Apriori and predictive Apriori algorithms to generate association rules and highlights the importance of the often neglected pre-processing steps in data mining process. The proposed approach can help doctors to explore their data in a better way.

[1]  S J Pöppl,et al.  Predicting Type 2 diabetes using an electronic nose-based artificial neural network analysis. , 2002, Diabetes, nutrition & metabolism.

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  L. Sweeney,et al.  Adding Semantics and Rigor to Association Rule Learning: the GenTree Approach , 2005 .

[4]  Glenn J. Myatt Making Sense of Data I: A Practical Guide to Exploratory Data Analysis and Data Mining , 2006 .

[5]  T. Åstebro,et al.  How to Deal with Missing Categorical Data: Test of a Simple Bayesian Method , 2003 .

[6]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[7]  Tobias Scheffer Finding association rules that trade support optimally against confidence , 2005 .

[8]  Rangsipan Marukatat Structure-Based Rule Selection Framework for Association Rule Mining of Traffic Accident Data , 2006, 2006 International Conference on Computational Intelligence and Security.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[11]  Rajeev Motwani,et al.  Scalable Techniques for Mining Causal Structures , 1998, Data Mining and Knowledge Discovery.

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[14]  Stefan Mutter,et al.  Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining , 2004, Australian Conference on Artificial Intelligence.

[15]  J. Pickup,et al.  Textbook of Diabetes , 1991 .

[16]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[17]  G. Reaven Role of Insulin Resistance in Human Disease , 1988, Diabetes.

[18]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[19]  Glenn J. Myatt,et al.  Handbook of Statistical Distributions with Applications , 2007 .

[20]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[21]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[22]  Régis Beuscart,et al.  Assessing association rules and decision trees on analysis of diabetes data from the DiabCare program in France. , 2002, Studies in health technology and informatics.

[23]  Robert J. Hilderman,et al.  Exploratory Quantitative Contrast Set Mining: A Discretization Approach , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[24]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..