Discovering frequent behaviors: time is an essential element of the context

One of the most popular problems in usage mining is the discovery of frequent behaviors. It relies on the extraction of frequent itemsets from usage databases. However, those databases are usually considered as a whole, and therefore, itemsets are extracted over the entire set of records. Our claim is that possible subsets, hidden within the structure of the data and containing relevant itemsets, may exist. These subsets, as well as the itemsets they contain, depend on the context. Time is an essential element of the context. The users’ intents will differ from one period to another. Behaviors over Christmas will be different from those extracted during the summer. Unfortunately, these periods might be lost because of arbitrary divisions of the data. The goal of our work is to find itemsets that are frequent over a specific period, but would not be extracted by traditional methods since their support is very low over the whole dataset. We introduce the definition of solid itemsets, which represent coherent and compact behaviors over specific periods, and we propose Sim, an algorithm for their extraction.

[1]  Hector Garcia-Molina,et al.  Overview of multidatabase transaction management , 2005, The VLDB Journal.

[2]  Hui Xiong,et al.  Characterizing pattern preserving clustering , 2008, Knowledge and Information Systems.

[3]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Philip S. Yu,et al.  Parameter Free Bursty Events Detection in Text Streams , 2005, VLDB.

[6]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[7]  Maguelonne Teisseire,et al.  Web usage mining: extracting unexpected periods from web logs , 2005, Data Mining and Knowledge Discovery.

[8]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[9]  Hongjun Lu,et al.  False-Negative Frequent Items Mining from Data Streams with Bursting , 2005, DASFAA.

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  PeregoRaffaele,et al.  Fast and Memory Efficient Mining of Frequent Closed Itemsets , 2006 .

[12]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2003 .

[13]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[14]  Shashi Shekhar,et al.  Mining Time-Profiled Associations: An Extended Abstract , 2005, PAKDD.

[15]  Ming-Syan Chen,et al.  On mining general temporal association rules in a publication database , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Xindong Wu,et al.  Computing the minimum-support for mining frequent patterns , 2008, Knowledge and Information Systems.

[17]  Girish Keshav Palshikar,et al.  Association Rules Mining Using Heavy Itemsets , 2005, COMAD.

[18]  Xiaodong Chen,et al.  Mining Temporal Features in Association Rules , 1999, PKDD.

[19]  Siu-Ming Yiu,et al.  Maintenance of maximal frequent itemsets in large databases , 2007, SAC '07.

[20]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[21]  Raymond T. Ng,et al.  Very large data bases , 1994 .

[22]  Philip S. Yu,et al.  Fast Burst Correlation of Financial Data , 2005, PKDD.

[23]  Florent Masseglia,et al.  Time Aware Mining of Itemsets , 2008, 2008 15th International Symposium on Temporal Representation and Reasoning.

[24]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[25]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[26]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[27]  Jianyong Wang,et al.  Efficient itemset generator discovery over a stream sliding window , 2009, CIKM.

[28]  Salvatore Orlando,et al.  Fast and memory efficient mining of frequent closed itemsets , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[30]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[31]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[32]  John F. Roddick,et al.  A Survey of Temporal Knowledge Discovery Paradigms and Methods , 2002, IEEE Trans. Knowl. Data Eng..

[33]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[34]  Vania Bogorny,et al.  A clustering-based approach for discovering interesting places in trajectories , 2008, SAC '08.

[35]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[36]  Maguelonne Teisseire,et al.  Successes and New Directions in Data Mining , 2007 .

[37]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[38]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.