Integrating clonal selection and deterministic sampling for efficient associative classification

Traditional Associative Classification (AC) algorithms typically search for all possible association rules to find a representative subset of those rules. Since the search space of such rules may grow exponentially as the support threshold decreases, the rules discovery process can be computationally expensive. One effective way to tackle this problem is to directly find a set of high-stakes association rules that potentially builds a highly accurate classifier. This paper introduces AC-CS, an AC algorithm that integrates the clonal selection of the immune system along with deterministic data sampling. Upon picking a representative sample of the original data, it proceeds in an evolutionary fashion to populate only rules that are likely to yield good classification accuracy. Empirical results on several real datasets show that the approach generates dramatically less rules than traditional AC algorithms. In addition, the proposed approach is significantly more efficient than traditional AC algorithms while achieving a competitive accuracy.

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Bin Chen,et al.  A new two-phase sampling based algorithm for discovering association rules , 2002, KDD.

[3]  Leandro Nunes de Castro,et al.  The Clonal Selection Algorithm with Engineering Applications 1 , 2000 .

[4]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[5]  Sanguthevar Rajasekaran,et al.  ML-DS: A Novel Deterministic Sampling Algorithm for Association Rules Mining , 2012, ICDM.

[6]  Yang Wang,et al.  Research on Vehicle Image Classifier Based on Concentration Regulating of Immune Clonal Selection , 2008, 2008 Fourth International Conference on Natural Computation.

[7]  Bing Liu,et al.  Classification Using Association Rules: Weaknesses and Enhancements , 2001 .

[8]  Vincenzo Cutello,et al.  Clonal Selection Algorithms: A Comparative Case Study Using Effective Mutation Potentials , 2005, ICARIS.

[9]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[10]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[11]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[12]  M.A.W. Houtsma,et al.  Set-Oriented Mining for Association Rules , 1993, ICDE 1993.

[13]  Jonathan Timmis,et al.  Artificial immune systems as a novel soft computing paradigm , 2003, Soft Comput..

[14]  J Timmis,et al.  An artificial immune system for data analysis. , 2000, Bio Systems.

[15]  F. Azuaje Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[16]  Alex Alves Freitas,et al.  An Artificial Immune System for Fuzzy-Rule Induction in Data Mining , 2004, PPSN.

[17]  Srinivasan Parthasarathy,et al.  Evaluation of sampling for data mining of association rules , 1997, Proceedings Seventh International Workshop on Research Issues in Data Engineering. High Performance Database Management for Large-Scale Applications.

[18]  Doron Rotem,et al.  Random sampling from databases: a survey , 1995 .

[19]  Siu Cheung Hui,et al.  Associative Classification With Artificial Immune System , 2009, IEEE Transactions on Evolutionary Computation.

[20]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[21]  Jonathan Timmis,et al.  Exploiting Parallelism Inherent in AIRS, an Artificial Immune Classifier , 2004, ICARIS.

[22]  Julie Greensmith,et al.  Quiet in Class: Classification, Noise and the Dendritic Cell Algorithm , 2011, ICARIS.

[23]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[24]  Julie Greensmith,et al.  Introducing Dendritic Cells as a Novel Immune-Inspired Algorithm for Anomoly Detection , 2005, ICARIS.

[25]  Alex Alves Freitas,et al.  Revisiting the Foundations of Artificial Immune Systems: A Problem-Oriented Perspective , 2003, ICARIS.

[26]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[27]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[28]  Fernando José Von Zuben,et al.  Learning and optimization using the clonal selection principle , 2002, IEEE Trans. Evol. Comput..

[29]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[30]  Bin Chen,et al.  Efficient data reduction with EASE , 2003, KDD '03.

[31]  Alan S. Perelson,et al.  The immune system, adaptation, and machine learning , 1986 .

[32]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[33]  Zhou Ji,et al.  Revisiting Negative Selection Algorithms , 2007, Evolutionary Computation.

[34]  P. Matzinger The Danger Model: A Renewed Sense of Self , 2002, Science.

[35]  Jonathan Timmis,et al.  Artificial Immune Recognition System (AIRS): An Immune-Inspired Supervised Learning Algorithm , 2004, Genetic Programming and Evolvable Machines.

[36]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[37]  Hugues Bersini,et al.  Hints for Adaptive Problem Solving Gleaned from Immune Networks , 1990, PPSN.

[38]  F. von Zuben,et al.  An evolutionary immune network for data clustering , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[39]  Andrew Watkins,et al.  Exploiting immunological metaphors in the development of serial, parallel and distributed learning algorithms , 2005 .

[40]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[41]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[42]  Yoshiteru Ishida Fully distributed diagnosis by PDP learning algorithm: towards immune network PDP model , 1990, 1990 IJCNN International Joint Conference on Neural Networks.