1. Introduction.- 2. Search Space Partition-Based Rule Mining.- 2.1 Problem Statement.- 2.1.1 Canonical Attribute Sequences (cas).- 2.1.2 Database.- 2.1.3 Support.- 2.1.4 Association Rule.- 2.1.5 Problem Statement.- 2.2 Search Space.- 2.3 Splitting Procedure.- 2.4 Enumerating ?-Frequent Attribute Sets (cass).- 2.5 Sequential Enumeration Procedure.- 2.6 Parallel Enumeration Procedure.- 2.6.1 Initial Load Balancing.- 2.6.2 Computing the Starting Sets.- 2.6.3 Enumeration Procedure.- 2.6.4 Dynamic Load Balancing.- 2.7 Generating the Association Rules.- 2.7.1 Sequential Generation.- 2.7.2 Parallel Generation.- 3. Apriori and Other Algorithms.- 3.1 Early Algorithms.- 3.1.1 AIS.- 3.1.2 SETM.- 3.2 The Apriori Algorithms.- 3.2.1 Apriori.- 3.2.2 AprioriTid.- 3.3 Direct Hashing and Pruning.- 3.3.1 Filtering Candidates.- 3.3.2 Database Trimming.- 3.3.3 The DHP Algorithm.- 3.4 Dynamic Set Counting.- 4. Mining for Rules over Attribute Taxonomies.- 4.1 Association Rules over Taxonomies.- 4.2 Problem Statement and Algorithms.- 4.3 Pruning Uninteresting Rules.- 4.3.1 Measure of Interest.- 4.3.2 Rule Pruning Algorithm.- 4.3.3 Attribute Presence-Based Pruning.- 5. Constraint-Based Rule Mining.- 5.1 Boolean Constraints.- 5.1.1 Syntax.- 5.1.2 Semantics.- 5.1.3 Propagation of Boolean Constraints.- 5.2 Prime Implicants.- 5.3 Problem Statement and Algorithms.- 6. Data Partition-Based Rule Mining.- 6.1 Data Partitioning.- 6.1.1 Building a Probabilistic Model.- 6.1.2 Bounding Large Deviations for One cas (Chernoff bounds).- 6.1.3 Bounding Large Deviations for Sets of cass.- 6.2 cas Enumeration with Partitioned Data.- 6.2.1 Data Partitioning.- 6.2.2 Local ?-Frequent cas Generation.- 6.2.3 Global ?-Frequent cas Generation.- 7. Mining for Rules with Categorical and Metric Attributes.- 7.1 Interval Systems and Quantitative Rules.- 7.2 k-Partial Completeness.- 7.3 Pruning Uninteresting Rules.- 7.3.1 Measure of Interest.- 7.3.2 Attribute Presence-Based Pruning.- 7.4 Enumeration Algorithms.- 8. Optimizing Rules with Quantitative Attributes.- 8.1 Solving 1-1-Type Rule Optimization Problems.- 8.1.1 Problem Statement.- 8.1.2 MC\S Problem.- 8.1.3 MS\C Problem.- 8.1.4 MG Problem.- 8.2 Solving d-1-Type Rule Optimization Problems.- 8.3 Solving 1-q-Type Rule Optimization Problems.- 8.3.1 Problem Statement.- 8.3.2 MS\C Problem.- 8.3.3 MG Problem.- 8.4 Solving d-q-Type Rule Optimization Problems.- 8.4.1 Problem Statement.- 8.4.2 Basic Enumeration.- 8.4.3 Enumeration with Pruning.- 8.4.4 Pruning the Instantiation Set.- 9. Beyond Support-Confidence Framework.- 9.1 A Criticism of the Support-Confidence Framework.- 9.2 Conviction.- 9.3 Pruning Conviction-Based Rules.- 9.3.1 Analyzing Conviction.- 9.3.2 Transitivity-Based Pruning.- 9.3.3 Improvement-Based Pruning.- 9.4 One-Step Association Rule Mining.- 9.4.1 Building a Procedure for One-Step Mining.- 9.4.2 Building a Procedure for Improvement-Based Pruning.- 9.5 Correlated Attribute-Set Mining.- 9.5.1 Collective Strength.- 9.5.2 Correlated Attribute-Set Enumeration.- 9.6 Refining Conviction: Association Rule Intensity.- 9.6.1 Measure Construction.- 9.6.2 Properties.- 9.6.3 Relating ?-int(s ? u) to conv(s ? u).- 9.6.4 Mining with the Intensity Measure.- 9.6.5 ?-Intensity Versus Intensity as Defined in [G96].- 10. Search Space Partition-Based Sequential Pattern Mining.- 10.1 Problem Statement.- 10.1.1 Sequences of cass.- 10.1.2 Database.- 10.1.3 Support.- 10.1.4 Problem Statement.- 10.2 Search Space.- 10.3 Splitting the Search Space.- 10.4 Splitting Procedure.- 10.5 Sequence Enumeration.- 10.5.1 Extending the Support Set Notion.- 10.5.2 Join Operations.- 10.5.3 Sequential Enumeration Procedure.- 10.5.4 Parallel Enumeration Procedure.- Appendix 1. Chernoff Bounds.- Appendix 2. Partitioning in Figure 10.5: Beyond 3rd Power.- Appendix 3. Partitioning in Figure 10.6: Beyond 3rd Power.- References.
[1]
Stephen E. Robertson,et al.
Relevance weighting of search terms
,
1976,
J. Am. Soc. Inf. Sci..
[2]
C. Apté,et al.
Lightweight Document Clustering
,
2000
.
[3]
Tian Zhang,et al.
BIRCH: an efficient data clustering method for very large databases
,
1996,
SIGMOD '96.
[4]
Michael E. Lesk,et al.
Computer Evaluation of Indexing and Text Processing
,
1968,
JACM.
[5]
Mohammed J. Zaki,et al.
Parallel classification for data mining on shared-memory multiprocessors
,
1999,
Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).
[6]
Martin van den Berg,et al.
Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery
,
1999,
Comput. Networks.
[7]
George Karypis,et al.
A Comparison of Document Clustering Techniques
,
2000
.
[8]
F. Masulli,et al.
Word sense disambiguation combining conceptual distance, frequency and gloss
,
2003,
International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.
[9]
W. Bruce Croft,et al.
Inference networks for document retrieval
,
1989,
SIGIR '90.
[10]
David F. Rogers,et al.
Similarity and distance measures for cellular manufacturing. Part II. An extension and comparison
,
1993
.
[11]
Sunita Sarawagi,et al.
Mining Generalized Association Rules and Sequential Patterns Using SQL Queries
,
1998,
KDD.
[12]
Oren Etzioni,et al.
Clustering web documents: a phrase-based method for grouping search engine results
,
1999
.
[13]
Ingo Wegener,et al.
The complexity of Boolean functions
,
1987
.
[14]
Luis Mateus Rocha,et al.
Singular value decomposition and principal component analysis
,
2003
.
[15]
Bala Srinivasan,et al.
A general inference network based architecture for multimedia information retrieval
,
2000,
2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[16]
Valerie Cross,et al.
Fuzzy information retrieval
,
1994,
Journal of Intelligent Information Systems.
[17]
P. F. Wang,et al.
Associative memory neural networks for information retrieval of text word pairs
,
2002,
Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..
[18]
Tetsuya Morita,et al.
A fuzzy document retrieval system using the keyword connection matrix and a learning method
,
1991
.
[19]
Guy A. Boy,et al.
A Fuzzy Method for the Modeling of Human-Computer Interactions in Information Retrieval Tasks
,
1986
.