Effective techniques for association rule mining and associative classification

Association rule mining is one of the major data mining techniques and perhaps the most common form of local-pattern discovery in unsupervised learning systems. Conventionally, an association rule is considered based mainly on the support and confidence measures which represent the usefulness and certainty of the rule respectively. However, the number of rules mined under the support-confidence framework is often too large and many of them are uninteresting. And the association rule mining process is computationally expensive as it requires exhaustive search in the huge space of possible rules. Apart from pattern discovery, association rules can also be used for data classification. Associative classification takes advantage of association rule mining in discovering a set of rules that can accurately generalize a dataset. Associative classification can achieve higher accuracy in comparison with other classification approaches. However, similar to association rule mining, the rule discovery process in associative classification is often complex and computationally expensive. And the set of association rules mined using the association rule mining framework may not be effective for direct employment by a classification task. This research aims to investigate effective techniques for enhancing association rule mining in both rule representation and mining algorithm, and discovering association rules for associative classification. As a result, we have developed a number of new techniques for association rule mining and associative classification. The contributions of this research are listed as follows: • A new type of constraint for association rules called category-based constraint is proposed. In addition, an Apriori-based algorithm called Category-based Apriori algorithm (AprioriCB) is also proposed for mining association rules with the category-based constraint. The proposed category-based constraint and AprioriCB algorithm can be used for mining interesting rules from datasets using category patterns. The AprioriCB algorithm is efficient as it can reduce considerably the search space of rules when mining interesting rules. i ATTENTION: The Singapore Copyright Act applies to the use of this document. Nanyang Technological University Library

[1]  Fabio A. González,et al.  TECNO-STREAMS: tracking evolving clusters in noisy data streams with a scalable immune system learning model , 2003, Third IEEE International Conference on Data Mining.

[2]  Leandro Nunes de Castro,et al.  The Clonal Selection Algorithm with Engineering Applications 1 , 2000 .

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[5]  Olfa Nasraoui,et al.  Mining Evolving Web Clickstreams with Explicit Retrieval Similarity Measures , 2004, WebDyn@WWW.

[6]  Julia Itskevitch AUTOMATIC HIERARCHICAL E-MAIL CLASSIFICATION USING ASSOCIATION RULES , 2001 .

[7]  Ust Beijing,et al.  Data Mining and Knowledge Discovery in Databases , 1999 .

[8]  Yannick Toussaint,et al.  How Far Association Rules and Statistical Indices help Structure Terminology , 2002 .

[9]  Dipankar Dasgupta,et al.  An Anomaly Entection Algorithm Inspired by the Immune Syste , 1999 .

[10]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[11]  F. von Zuben,et al.  An evolutionary immune network for data clustering , 2000, Proceedings. Vol.1. Sixth Brazilian Symposium on Neural Networks.

[12]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[13]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[14]  Nagwa M. El-Makky,et al.  A note on "beyond market baskets: generalizing association rules to correlations" , 2000, SKDD.

[15]  Bing Liu,et al.  Classification Using Association Rules: Weaknesses and Enhancements , 2001 .

[16]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[17]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[18]  A. B. Watkins,et al.  A new classifier based on resource limited artificial immune systems , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[19]  Santosh S. Vempala,et al.  The Random Projection Method , 2005, DIMACS Series in Discrete Mathematics and Theoretical Computer Science.

[20]  Johannes Fürnkranz,et al.  Exploiting Structural Information for Text Classification on the WWW , 1999, IDA.

[21]  Shanlin Yang,et al.  CSMC: A combination strategy for multi-class classification based on multiple association rules , 2008, Knowl. Based Syst..

[22]  Daniel Sánchez,et al.  Measuring the accuracy and interest of association rules: A new framework , 2002, Intell. Data Anal..

[23]  Haym Hirsh,et al.  Exploiting Background Information in Knowledge Discovery from Text , 1997, Journal of Intelligent Information Systems.

[24]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[25]  J. Neidhoefer,et al.  Immunized Adaptive Critic for an Autonomous Aircraft Control Application , 1999 .

[26]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[27]  Leandro Nunes de Castro,et al.  Artificial Immune Systems: Part I-Basic Theory and Applications , 1999 .

[28]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[29]  Kenneth A. De Jong,et al.  The Coevolution of Antibodies for Concept Learning , 1998, PPSN.

[30]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[31]  Dipankar Dasgupta,et al.  Immunity-Based Intrusion Detection System: A General Framework , 1999 .

[32]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[33]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[34]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[35]  W. John Wilbur,et al.  The automatic identification of stop words , 1992, J. Inf. Sci..

[36]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[37]  Olfa Nasraoui,et al.  A New Evolutionary Approach to Web Usage and Context Sensitive Associations Mining , 2002 .

[38]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[39]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[40]  Andrew M. Tyrrell,et al.  Immunotronics: Hardware Fault Tolerance Inspired by the Immune System , 2000, ICES.

[41]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[42]  Philip S. Yu,et al.  Scoring the Data Using Association Rules , 2003, Applied Intelligence.

[43]  A.C.M. Fong,et al.  Prediction confidence for associative classification , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[44]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[45]  Heikki Mannila,et al.  Finding interesting rules from large sets of discovered association rules , 1994, CIKM '94.

[46]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[47]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[48]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[49]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[50]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[51]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[52]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[53]  Jiawei Han,et al.  Data Mining for Web Intelligence , 2002, Computer.

[54]  Jonathan Timmis,et al.  A resource limited artificial immune system for data analysis , 2001, Knowl. Based Syst..

[55]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[56]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[57]  Osmar R. Zaïane,et al.  Text document categorization by term association , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[58]  Bin Chen,et al.  Generating association rules from semi-structured documents using an extended concept hierarchy , 1997, CIKM '97.

[59]  Edward H. Shortliffe,et al.  A model of inexact reasoning in medicine , 1990 .

[60]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[61]  Dunja Mladenic,et al.  Feature Subset Selection in Text-Learning , 1998, ECML.

[62]  Markus Hegland,et al.  Algorithms for Association Rules , 2002, Machine Learning Summer School.

[63]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[64]  David J. Hand,et al.  Intelligent Data Analysis: An Introduction , 2005 .

[65]  A. B. Watkins,et al.  A resource limited artificial immune classifier , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).

[66]  Arbee L. P. Chen,et al.  An Efficient Approach for Incremental Association Rule Mining , 1999, PAKDD.

[67]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[68]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[69]  Bart Goethals,et al.  Efficient frequent pattern mining , 2002 .

[70]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[71]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[72]  Fernando José Von Zuben,et al.  Learning and optimization using the clonal selection principle , 2002, IEEE Trans. Evol. Comput..

[73]  S.C. Hui,et al.  Web mining for cyber monitoring and filtering , 2004, IEEE Conference on Cybernetics and Intelligent Systems, 2004..

[74]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[75]  Tetsuya Nasukawa,et al.  Text analysis and knowledge mining system , 2001, IBM Syst. J..

[76]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[77]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[78]  Jonathan Timmis,et al.  Artificial immune systems as a novel soft computing paradigm , 2003, Soft Comput..

[79]  J Timmis,et al.  An artificial immune system for data analysis. , 2000, Bio Systems.

[80]  F. Azuaje Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[81]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, PKDD.

[82]  Alex Alves Freitas,et al.  Understanding the crucial differences between classification and discovery of association rules: a position paper , 2000, SKDD.

[83]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[84]  Romaric Besançon,et al.  Text Mining, knowledge extraction from unstructured textual data , 1998 .

[85]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[86]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[87]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[88]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[89]  Heikki Mannila,et al.  Improved Methods for Finding Association Rules , 1994 .

[90]  A. Joshi,et al.  Web mining: research and practice , 2004, Computing in Science & Engineering.

[91]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[92]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[94]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[95]  Robert A. Lordo,et al.  Learning from Data: Concepts, Theory, and Methods , 2001, Technometrics.

[96]  Nils J. Nilsson,et al.  MLC++, A Machine Learning Library in C++. , 1995 .

[97]  Robert H. Gross,et al.  Web Page Categorization and Feature Selection Using Association Rule and Principal Component Cluster , 1997 .

[98]  Bart Goethals,et al.  Memory issues in frequent itemset mining , 2004, SAC '04.

[99]  Kjersti Aas,et al.  Text Categorisation: A Survey , 1999 .

[100]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[101]  Jean-Marc Adamo,et al.  Data Mining for Association Rules and Sequential Patterns , 2000, Springer New York.

[102]  Osmar R. Zaïane,et al.  Classifying Text Documents by Associating Terms With Text Categories , 2002, Australasian Database Conference.

[103]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[104]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[105]  Philip S. Yu,et al.  A new framework for itemset generation , 1998, PODS '98.

[106]  Elena Baralis,et al.  On support thresholds in associative classification , 2004, SAC '04.

[107]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.