Discovery of interesting episodes in sequence data

There is considerable body of work on sequence mining of transactional data. Most of the related work on point data (not significant intervals) makes several passes over the entire dataset in order to discover frequently occurring (sequential) patterns. But Hybrid apriori, proposed in this paper, as the name implies is an apriori-class of mining algorithm in SQL and takes a different approach. Significant intervals for each event (or device) is computed first and used for detecting frequent event patterns. The advantages of this approach are that the data set is compressed to find significant intervals thereby reducing the size of input used. Also, each event/device is processed individually allowing for parallel computation of individual events. Then the hybrid apriori algorithm works on the significant intervals using an apriori-style algorithm adapted to intervals. Our approach has significant advantages over the traditional mining algorithms in terms of its efficiency, scalability and storage requirements.

[1]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[2]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[3]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[4]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[5]  Diane J. Cook,et al.  MavHome: an agent-based smart home , 2003, Proceedings of the First IEEE International Conference on Pervasive Computing and Communications, 2003. (PerCom 2003)..

[6]  George Karypis,et al.  A Universal Formulation of Sequential Patterns , 1999 .

[7]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.