Data Mining for Association Rules and Sequential Patterns

1. Introduction.- 2. Search Space Partition-Based Rule Mining.- 2.1 Problem Statement.- 2.1.1 Canonical Attribute Sequences (cas).- 2.1.2 Database.- 2.1.3 Support.- 2.1.4 Association Rule.- 2.1.5 Problem Statement.- 2.2 Search Space.- 2.3 Splitting Procedure.- 2.4 Enumerating ?-Frequent Attribute Sets (cass).- 2.5 Sequential Enumeration Procedure.- 2.6 Parallel Enumeration Procedure.- 2.6.1 Initial Load Balancing.- 2.6.2 Computing the Starting Sets.- 2.6.3 Enumeration Procedure.- 2.6.4 Dynamic Load Balancing.- 2.7 Generating the Association Rules.- 2.7.1 Sequential Generation.- 2.7.2 Parallel Generation.- 3. Apriori and Other Algorithms.- 3.1 Early Algorithms.- 3.1.1 AIS.- 3.1.2 SETM.- 3.2 The Apriori Algorithms.- 3.2.1 Apriori.- 3.2.2 AprioriTid.- 3.3 Direct Hashing and Pruning.- 3.3.1 Filtering Candidates.- 3.3.2 Database Trimming.- 3.3.3 The DHP Algorithm.- 3.4 Dynamic Set Counting.- 4. Mining for Rules over Attribute Taxonomies.- 4.1 Association Rules over Taxonomies.- 4.2 Problem Statement and Algorithms.- 4.3 Pruning Uninteresting Rules.- 4.3.1 Measure of Interest.- 4.3.2 Rule Pruning Algorithm.- 4.3.3 Attribute Presence-Based Pruning.- 5. Constraint-Based Rule Mining.- 5.1 Boolean Constraints.- 5.1.1 Syntax.- 5.1.2 Semantics.- 5.1.3 Propagation of Boolean Constraints.- 5.2 Prime Implicants.- 5.3 Problem Statement and Algorithms.- 6. Data Partition-Based Rule Mining.- 6.1 Data Partitioning.- 6.1.1 Building a Probabilistic Model.- 6.1.2 Bounding Large Deviations for One cas (Chernoff bounds).- 6.1.3 Bounding Large Deviations for Sets of cass.- 6.2 cas Enumeration with Partitioned Data.- 6.2.1 Data Partitioning.- 6.2.2 Local ?-Frequent cas Generation.- 6.2.3 Global ?-Frequent cas Generation.- 7. Mining for Rules with Categorical and Metric Attributes.- 7.1 Interval Systems and Quantitative Rules.- 7.2 k-Partial Completeness.- 7.3 Pruning Uninteresting Rules.- 7.3.1 Measure of Interest.- 7.3.2 Attribute Presence-Based Pruning.- 7.4 Enumeration Algorithms.- 8. Optimizing Rules with Quantitative Attributes.- 8.1 Solving 1-1-Type Rule Optimization Problems.- 8.1.1 Problem Statement.- 8.1.2 MC\S Problem.- 8.1.3 MS\C Problem.- 8.1.4 MG Problem.- 8.2 Solving d-1-Type Rule Optimization Problems.- 8.3 Solving 1-q-Type Rule Optimization Problems.- 8.3.1 Problem Statement.- 8.3.2 MS\C Problem.- 8.3.3 MG Problem.- 8.4 Solving d-q-Type Rule Optimization Problems.- 8.4.1 Problem Statement.- 8.4.2 Basic Enumeration.- 8.4.3 Enumeration with Pruning.- 8.4.4 Pruning the Instantiation Set.- 9. Beyond Support-Confidence Framework.- 9.1 A Criticism of the Support-Confidence Framework.- 9.2 Conviction.- 9.3 Pruning Conviction-Based Rules.- 9.3.1 Analyzing Conviction.- 9.3.2 Transitivity-Based Pruning.- 9.3.3 Improvement-Based Pruning.- 9.4 One-Step Association Rule Mining.- 9.4.1 Building a Procedure for One-Step Mining.- 9.4.2 Building a Procedure for Improvement-Based Pruning.- 9.5 Correlated Attribute-Set Mining.- 9.5.1 Collective Strength.- 9.5.2 Correlated Attribute-Set Enumeration.- 9.6 Refining Conviction: Association Rule Intensity.- 9.6.1 Measure Construction.- 9.6.2 Properties.- 9.6.3 Relating ?-int(s ? u) to conv(s ? u).- 9.6.4 Mining with the Intensity Measure.- 9.6.5 ?-Intensity Versus Intensity as Defined in [G96].- 10. Search Space Partition-Based Sequential Pattern Mining.- 10.1 Problem Statement.- 10.1.1 Sequences of cass.- 10.1.2 Database.- 10.1.3 Support.- 10.1.4 Problem Statement.- 10.2 Search Space.- 10.3 Splitting the Search Space.- 10.4 Splitting Procedure.- 10.5 Sequence Enumeration.- 10.5.1 Extending the Support Set Notion.- 10.5.2 Join Operations.- 10.5.3 Sequential Enumeration Procedure.- 10.5.4 Parallel Enumeration Procedure.- Appendix 1. Chernoff Bounds.- Appendix 2. Partitioning in Figure 10.5: Beyond 3rd Power.- Appendix 3. Partitioning in Figure 10.6: Beyond 3rd Power.- References.

[1]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[2]  C. Apté,et al.  Lightweight Document Clustering , 2000 .

[3]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[4]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[5]  Mohammed J. Zaki,et al.  Parallel classification for data mining on shared-memory multiprocessors , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[7]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[8]  F. Masulli,et al.  Word sense disambiguation combining conceptual distance, frequency and gloss , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[9]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[10]  David F. Rogers,et al.  Similarity and distance measures for cellular manufacturing. Part II. An extension and comparison , 1993 .

[11]  Sunita Sarawagi,et al.  Mining Generalized Association Rules and Sequential Patterns Using SQL Queries , 1998, KDD.

[12]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[13]  Ingo Wegener,et al.  The complexity of Boolean functions , 1987 .

[14]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[15]  Bala Srinivasan,et al.  A general inference network based architecture for multimedia information retrieval , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  Valerie Cross,et al.  Fuzzy information retrieval , 1994, Journal of Intelligent Information Systems.

[17]  P. F. Wang,et al.  Associative memory neural networks for information retrieval of text word pairs , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[18]  Tetsuya Morita,et al.  A fuzzy document retrieval system using the keyword connection matrix and a learning method , 1991 .

[19]  Guy A. Boy,et al.  A Fuzzy Method for the Modeling of Human-Computer Interactions in Information Retrieval Tasks , 1986 .