A Regression-Based Temporal Pattern Mining Scheme for Data Streams

We devise in this paper a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams), to mine frequent temporal patterns for data streams. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regression-based compact pattern representation.To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. Specifically, to meet the space constraint, we devise for pattern representation a compact ATF (standing for Accumulated Time and Frequency) form to aggregately comprise all the information required for regression analysis. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. Synthetic data and a real dataset which contains net-Permission work alarm logs from a major telecommunication company are utilized to verify the feasibility of algorithm FTP-DS.

[1]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[2]  Philip S. Yu,et al.  Mining Relationship between Triggering and Consequential Events in a Short Transaction Database , 2002, SDM.

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[5]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[7]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[8]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[9]  Piotr Indyk,et al.  Maintaining stream statistics over sliding windows: (extended abstract) , 2002, SODA '02.

[10]  Divesh Srivastava,et al.  On computing correlated aggregates over continual data streams , 2001, SIGMOD '01.

[11]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[12]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[13]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[14]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[15]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[16]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[17]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[18]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[19]  Hongjun Lu,et al.  Stock movement prediction and N-dimensional inter-transaction association rules , 1998, SIGMOD 1998.

[20]  Philip S. Yu,et al.  Mining long sequential patterns in a noisy environment , 2002, SIGMOD '02.

[21]  Ming-Syan Chen,et al.  Sliding-window filtering: an efficient algorithm for incremental mining , 2001, CIKM '01.

[22]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[23]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[24]  Sunita Sarawagi,et al.  Mining Surprising Patterns Using Temporal Description Length , 1998, VLDB.

[25]  Dimitrios Gunopulos,et al.  Temporal Aggregation over Data Streams Using Multiple Granularities , 2002, EDBT.