Discovering Transitional Patterns and Their Significant Milestones in Transaction Databases

A transaction database usually consists of a set of time-stamped transactions. Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms (such as Apriori and FP-growth) do not consider the time stamps associated with the transactions. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp of each transaction and discover patterns whose frequency dramatically changes over time. We define a new type of patterns, called transitional patterns, to capture the dynamic behavior of frequent patterns in a transaction database. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Moreover, we develop an algorithm to mine from a transaction database the set of transitional patterns along with their significant milestones. Our experimental studies on real-world databases illustrate that mining positive and negative transitional patterns is highly promising as a practical and useful approach for discovering novel and interesting knowledge from large databases.

[1]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jian Pei,et al.  From sequential pattern mining to structured pattern mining: A pattern-growth approach , 2004, Journal of Computer Science and Technology.

[3]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[4]  Sushil Jajodia,et al.  Discovering calendar-based temporal association rules , 2001, Proceedings Eighth International Symposium on Temporal Representation and Reasoning. TIME 2001.

[5]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[6]  Jaideep Srivastava,et al.  Indirect Association: Mining Higher Order Dependencies in Data , 2000, PKDD.

[7]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[9]  Qian Wan,et al.  Transitional Patterns and Their Significant Milestones , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[11]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[12]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[13]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[14]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[15]  Xiangji Huang,et al.  Discovery of interesting association rules from Livelink web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[16]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[17]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[18]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[19]  R. T. Ogden,et al.  Testing change-points with linear trend , 1994 .

[20]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[21]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  James Bailey,et al.  Fast Algorithms for Mining Emerging Patterns , 2002, PKDD.

[23]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[24]  Kotagiri Ramamohanarao,et al.  Emerging Patterns and Classification , 2000, ASIAN.

[25]  Heikki Mannila,et al.  Using Markov chain Monte Carlo and dynamic programming for event sequence data , 2005, Knowledge and Information Systems.

[26]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[27]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[28]  D. Hawkins POINT ESTIMATION OF THE PARAMETERS OF PIECEWISE REGRESSION MODELS. , 1976 .

[29]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[30]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[31]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[33]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[34]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[35]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[36]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[37]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[38]  Ron Kohavi,et al.  WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points , 2002, Lecture Notes in Computer Science.

[39]  Vipin Kumar,et al.  Mining Indirect Associations in Web Data , 2001, WEBKDD.

[40]  D. Hawkins,et al.  Optimal zonation of digitized sequential data , 1973 .

[41]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[42]  Jaideep Srivastava,et al.  Revised Papers from the Third International Workshop on Mining Web Log Data Across All Customers Touch Points , 2001 .

[43]  Qian Wan,et al.  An efficient approach to mining indirect associations , 2006, Journal of Intelligent Information Systems.