Associative Classification using Automata with Structure based Merging

Associative Classification, a combination of two important and different fields (classification and association rule mining), aims at building accurate and interpretable classifiers by means of association rules. The process used to generate association rules is exponential by nature; thus in AC, researchers focused on the reduction of redundant rules via rules pruning and rules ranking techniques. These techniques take an important part in improving the efficiency; however, pruning may negatively affect the accuracy by pruning interesting rules. Further, these techniques are time consuming in term of processing and also require domain specific knowledge to decide upon the selection of the best ranking and pruning strategy. In order to overcome these limitations, in this research, an automata based solution is proposed to improve the classifier’s accuracy while replacing ranking and pruning. A new merging concept is introduced which used structure based similarity to merge the association rules. The merging not only help to reduce the classifier size but also minimize the loss of information by avoiding the pruning. The extensive experiments showed that the proposed algorithm is efficient than AC, Naive Bayesian, and Rule and Tree based classifiers in term of accuracy, space, and speed. The merging takes the advantages of the repetition in the rules set and keep the classifier as small as possible.

[1]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[2]  Ryszard S. Michalski,et al.  AQ15: Incremental Learning of Attribute-Based Descriptions from Examples: The Method and User's Guide , 1986 .

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  Illhoi Yoo,et al.  Data Mining in Healthcare and Biomedicine: A Survey of the Literature , 2012, Journal of Medical Systems.

[5]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[6]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[7]  Sebastián Ventura,et al.  Evaluating associative classification algorithms for Big Data , 2019, Big Data Analytics.

[8]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[9]  Kamal Ali,et al.  Partial Classification Using Association Rules , 1997, KDD.

[10]  Taneli Mielikäinen,et al.  An Automata Approach to Pattern Collections , 2004, KDID.

[11]  Elena Baralis,et al.  A lazy approach to pruning classification rules , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[12]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[13]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[14]  Bart Baesens,et al.  Decision diagrams in machine learning: an empirical study on real-life credit-risk data , 2004, Expert Syst. Appl..

[15]  Francesco Marcelloni,et al.  A MapReduce solution for associative classification of big data , 2016, Inf. Sci..

[16]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[17]  Elena Baralis,et al.  A Lazy Approach to Associative Classification , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Elena Baralis,et al.  Scaling associative classification for very large datasets , 2017, Journal of Big Data.

[19]  Peter D. Kemp,et al.  Predictive modelling of hill-pasture productivity: integration of a decision tree and a geographical information system , 2006 .

[20]  S. Ventura,et al.  A Grammar-Guided Genetic Programing Algorithm for Associative Classification in Big Data , 2019, Cognitive Computation.

[21]  Peter D. Kemp,et al.  A decision tree approach modelling functional group abundance in a pasture ecosystem , 2005 .

[22]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[23]  Umesh Deshpande,et al.  A stock market portfolio recommender system based on association rule mining , 2013, Appl. Soft Comput..

[24]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Jiahui Liu,et al.  An Efficient Parallel String Matching Algorithm Based on DFA , 2012, ISCTCS.

[27]  Hong Shen,et al.  Mining the smallest association rule set for predictions , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[28]  Peter I. Cowling,et al.  MCAR: multi-class classification based on association rule , 2005, The 3rd ACS/IEEE International Conference onComputer Systems and Applications, 2005..

[29]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[30]  Selwyn Piramuthu,et al.  On learning to predict Web traffic , 2003, Decis. Support Syst..

[31]  Fadi Thabtah,et al.  Rule Pruning in Associative Classification Mining , 2005 .

[32]  Chih-Hung Wu,et al.  Associative classification with a new condenseness measure , 2015 .

[33]  Benjamin C. M. Fung,et al.  E-mail authorship attribution using customized associative classification , 2015, Digit. Investig..

[34]  Bogdan Gabrys,et al.  New Measure of Classifier Dependency in Multiple Classifier Systems , 2002, Multiple Classifier Systems.

[35]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[36]  Fadi A. Thabtah,et al.  Parallel Associative Classification Data Mining Frameworks Based MapReduce , 2015, Parallel Process. Lett..

[37]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[38]  T. L. McCluskey,et al.  A New Classification Based on Association Algorithm , 2010, J. Inf. Knowl. Manag..

[39]  Sven F. Crone,et al.  The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing , 2006, Eur. J. Oper. Res..

[40]  Mohammed J. Zaki,et al.  Lazy Associative Classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[41]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[42]  Fadi A. Thabtah,et al.  MAC: A Multiclass Associative Classification Algorithm , 2012, J. Inf. Knowl. Manag..