Association mining in time-varying domains

The input of a classical application of association mining is a large set of transactions, each consisting of a list of items a customer has paid for at a supermarket checkout desk. The goal is to identify groups of items ("itemsets") that frequently co-occur in the same shopping carts. This paper focuses on an aspect that has so far received relatively little attention: the composition of the list of frequent itemsets may change in time as the purchasing habits get affected by fashion, season, and introduction of new products. We investigate (1) heuristics for the detection of such changes in time-ordered databases and (2) techniques that update the set of frequent itemsets when the change is detected. As the main performance criterion, we use the accuracy with which our program maintains the current list of frequent itemsets in a time-varying environment.

[1]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[2]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[3]  A. Pettitt A Non‐Parametric Approach to the Change‐Point Problem , 1979 .

[4]  U. Menzefricke A Bayesian Analysis of a Change in the Precision of a Sequence of Independent Normal Random Variables at an Unknown Time Point , 1981 .

[5]  J. M. Freeman An unknown change point and goodness of fit , 1985 .

[6]  M. Srivastava,et al.  Likelihood Ratio Tests for a Change in the Multivariate Normal Mean , 1986 .

[7]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[8]  Miroslav Kubat Floating approximation in time-varying knowledge bases , 1989, Pattern Recognit. Lett..

[9]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[10]  H. J. Arnold Introduction to the Practice of Statistics , 1990 .

[11]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[12]  Ronald L. Rivest,et al.  Incrementally Learning Time-Varying Half Planes , 1991, NIPS.

[13]  Miroslav Kubat A machine learning-based approach to load balancing in computer networks , 1992 .

[14]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[15]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Gerhard Widmer Combining Robustness and Flexibility in Learning Drifting Concepts , 1994, ECAI.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[20]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[21]  F. Kianifard Applied Multivariate Data Analysis: Volume II: Categorical and Multivariate Methods , 1994 .

[22]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[23]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains (Extended Abstract) , 1995, ECML.

[24]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[25]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[26]  B. Yakir Dynamic sampling policy for detecting a change in distribution, with a probability bound on false alarm , 1996 .

[27]  LearningStan Matwin,et al.  The Role of Context in Concept , 1996 .

[28]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[29]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[30]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[31]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[32]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[33]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[34]  Philip S. Yu,et al.  Data mining for path traversal patterns in a web environment , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[35]  Marcus A. Maloof,et al.  Progressive partial memory learning , 1997 .

[36]  David Wai-Lok Cheung,et al.  A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[37]  James E. Pitkow,et al.  In Search of Reliable Usage Data on the WWW , 1997, Comput. Networks.

[38]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[39]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[40]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[41]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[42]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[43]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[44]  Sridhar Ramaswamy,et al.  On the Discovery of Interesting Patterns in Association Rules , 1998, VLDB.

[45]  Kyuseok Shim,et al.  Mining optimized association rules with categorical and numeric attributes , 1998, Proceedings 14th International Conference on Data Engineering.

[46]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[47]  T. Urdan,et al.  The Role of Context , 1999 .

[48]  Jiawei Han,et al.  Mining Multiple-Level Association Rules in Large Databases , 1999, IEEE Trans. Knowl. Data Eng..

[49]  Vijay V. Raghavan,et al.  The Item-Set Tree: A Data Structure for Data Mining , 1999, DaWaK.

[50]  Guoqing Chen,et al.  Mining generalized association rules with fuzzy taxonomic structures , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[51]  Necip Fazil Ayan,et al.  An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[52]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[53]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[54]  David J. DeWitt,et al.  Using a knowledge cache for interactive discovery of association rules , 1999, KDD '99.

[55]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[56]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[57]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[58]  Maloof,et al.  Selecting Examples for Partial Memory LearningMARCUS , 2000 .

[59]  Vijay V. Raghavan,et al.  Dynamic Data Mining , 2000, IEA/AIE.

[60]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[61]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[62]  Michael Bonnell Harries Batch learning in domains with hidden changes in context , 2000 .

[63]  Balaji Padmanabhan,et al.  Small is beautiful: discovering the minimal set of unexpected patterns , 2000, KDD '00.

[64]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2001, Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001.

[65]  Johannes Gehrke,et al.  DEMON: Mining and Monitoring Evolving Data , 2001, IEEE Trans. Knowl. Data Eng..

[66]  Vijay V. Raghavan,et al.  Visualizing association mining results through hierarchical clusters , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[67]  Wynne Hsu,et al.  Discovering the set of fundamental rule changes , 2001, KDD '01.

[68]  Vijay V. Raghavan,et al.  A Theoretical Framework for Association Mining Based on the Boolean Retrieval Model , 2001, DaWaK.

[69]  Philip S. Yu,et al.  A New Approach to Online Generation of Association Rules , 2001, IEEE Trans. Knowl. Data Eng..

[70]  Philip S. Yu,et al.  Finding Localized Associations in Market Basket Data , 2002, IEEE Trans. Knowl. Data Eng..

[71]  Johannes Gehrke,et al.  A Framework for Measuring Differences in Data Characteristics , 2002, J. Comput. Syst. Sci..

[72]  Peter D. Turney The Identification of Context-Sensitive Features: A Formal Definition of Context for Concept Learning , 2002, ArXiv.

[73]  Peter D. Turney The Management of Context-Sensitive Features: A Review of Strategies , 2002, ArXiv.

[74]  Miroslav Kubat,et al.  Association Mining in Gradually Changing Domains , 2003, FLAIRS Conference.

[75]  Vijay V. Raghavan,et al.  Itemset Trees for Targeted Association Querying , 2003, IEEE Trans. Knowl. Data Eng..

[76]  Ming-Syan Chen,et al.  Progressive Partition Miner: An Efficient Algorithm for Mining General Temporal Association Rules , 2003, IEEE Trans. Knowl. Data Eng..

[77]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[78]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[79]  Philip M. Long,et al.  Tracking Drifting Concepts By Minimizing Disagreements , 2004, Machine Learning.

[80]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[81]  Ryszard S. Michalski,et al.  Selecting Examples for Partial Memory Learning , 2000, Machine Learning.

[82]  Dennis P. Groth,et al.  Average-Case Performance of the Apriori Algorithm , 2004, SIAM J. Comput..

[83]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[84]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[85]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[86]  Gerhard Widmer,et al.  Adapting to Drift in Continuous Domains , 2007 .

[87]  Ralf Klinkenberg,et al.  Using Labeled and Unlabeled Data to Learn Drifting Concepts , 2007 .