Mining sequential patterns including time intervals

We introduce the problem of mining sequential patterns among items in a large database of sequences. For example, let us consider a database recording storm patterns, in such an area, at such a given time. An example of the patterns we are interested in is: '10% of storms go through area C 3 days after they strike areas A and B.' Previous research would have considered some equivalent patterns, but such work would use only 'after' (a succession in time) and omit '3 days after' (a period). Obtaining such patterns is very useful because we know when actions should be taken. To address this issue, we are studying an algorithm for discovering ordered lists of itemsets (a sets of items) with the time intervals between itemsets that occur in a sufficient number of sequences of transactions, we call these patterns 'delta pattern.' In this algorithm, we cluster time intervals between two neighboring itemsets using the CF-tree method while scanning the database and counting the number of occurrences of each candidate pattern. Extensive simulations are being conducted to evaluate patterns and to discover the power and performance of this algorithm. This algorithm has very good scale-up properties in execution time with respect to the number of data-sequences.

[1]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[2]  Arbee L. P. Chen,et al.  An efficient approach to discovering knowledge from large databases , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[5]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.