Mining frequent arrangements of temporal intervals

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets.

[1]  Geoffrey I. Webb Discovering significant rules , 2006, KDD '06.

[2]  Carol Neidle,et al.  Syntactic agreement across language modalities: American Sign Language , 2006 .

[3]  Geoffrey I. Webb,et al.  K-Optimal Rule Discovery , 2005, Data Mining and Knowledge Discovery.

[4]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[5]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[6]  John F. Roddick,et al.  Incremental Meta-Mining from Large Temporal Data Sets , 1998, ER Workshops.

[7]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[8]  Pang-Ning Tan,et al.  Interestingness Measures for Association Patterns : A Perspective , 2000, KDD 2000.

[9]  James F. Allen,et al.  Actions and Events in Interval Temporal Logic , 1994 .

[10]  Jun-Lin Lin Mining maximal frequent intervals , 2003, SAC '03.

[11]  Kien A. Hua,et al.  Knowledge Discovery from Series of Interval Events , 2000, Journal of Intelligent Information Systems.

[12]  Jean-François Boulicaut,et al.  GO-SPADE: Mining Sequential Patterns over Datasets with Consecutive Repetitions , 2003, MLDM.

[13]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[14]  Howard J. Hamilton,et al.  Evaluation of Interestingness Measures for Ranking Discovered Knowledge , 2001, PAKDD.

[15]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[16]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[17]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[18]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[19]  Jiawei Han,et al.  Discovering interesting patterns through user's interactive feedback , 2006, KDD '06.

[20]  John F. Roddick,et al.  Discovering Richer Temporal Association Rules from Interval-Based Data , 2005, DaWaK.

[21]  Soon Myoung Chung,et al.  A scalable algorithm for mining maximal frequent sequences using a sample , 2008, Knowledge and Information Systems.

[22]  John F. Roddick,et al.  Mining Relationships Between Interacting Episodes , 2004, SDM.

[23]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Xiaodong Chen,et al.  Mining Temporal Features in Association Rules , 1999, PKDD.

[25]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[26]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[27]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[28]  João Costa,et al.  Studies on agreement , 2006 .

[29]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[30]  Dimitrios Gunopulos,et al.  Discovering frequent arrangements of temporal intervals , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[31]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[32]  Carol Neidle,et al.  The Syntax of American Sign Language: Functional Categories and Hierarchical Structure , 1999 .

[33]  Dimitrios Gunopulos,et al.  Efficient Mining of Spatiotemporal Patterns , 2001, SSTD.

[34]  Carol Neidle,et al.  Language across modalities: ASL focus and question constructions , 2002 .

[35]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[36]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[37]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[38]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[39]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[40]  Rajjan Shinghal,et al.  Evaluating the Interestingness of Characteristic Rules , 1996, KDD.

[41]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[42]  Panagiotis Papapetrou,et al.  Discovering Frequent Poly-Regions in DNA Sequences , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[43]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[44]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[45]  Vipin Kumar,et al.  Generalizing the notion of confidence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[46]  B. Davey,et al.  Introduction to Lattices and Order: Appendix B: further reading , 2002 .

[47]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[48]  C Neidle,et al.  SignStream: A tool for linguistic and computer vision research on visual-gestural language data , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[49]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[50]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[51]  Dino Pedreschi,et al.  Efficient Mining of Temporally Annotated Sequences , 2006, SDM.

[52]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[53]  P. S. Sastry,et al.  Discovering Frequent Generalized Episodes When Events Persist for Different Durations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[54]  Gustavo Rossi,et al.  An approach to discovering temporal association rules , 2000, SAC '00.

[55]  Scott K. Liddell American Sign Language Syntax , 1981 .

[56]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[57]  Geoffrey Restall Coulter,et al.  American sign language typology , 1979 .

[58]  Hongjun Lu,et al.  Stock movement prediction and N-dimensional inter-transaction association rules , 1998, SIGMOD 1998.

[59]  Frank Höppner Discovery of Temporal Patterns. Learning Rules about the Qualitative Behaviour of Time Series , 2001, PKDD.

[60]  Gemma C. Garriga,et al.  Summarizing Sequential Data with Closed Partial Orders , 2005, SDM.

[61]  Charlotte Baker-Shenk,et al.  A Microanalysis of the Nonmanual Components of Questions in American Sign Language , 1983 .

[62]  Chih-Ping Wei,et al.  Discovery of temporal patterns from process instances , 2004, Comput. Ind..

[63]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[64]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[65]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[66]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[67]  Carol Neidle,et al.  Syntactic agreement across language modalities , 2006 .

[68]  Frank Klawonn,et al.  Finding informative rules in interval sequences , 2001, Intell. Data Anal..

[69]  Carol Neidle,et al.  SignStream™: A database tool for research on visual-gestural language , 2002 .

[70]  Jitender S. Deogun,et al.  Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences , 2002, ISMIS.

[71]  Sridhar Ramaswamy,et al.  Cyclic association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[72]  John F. Roddick,et al.  ARMADA - An algorithm for discovering richer relative temporal association rules from interval-based data , 2007, Data Knowl. Eng..

[73]  Yen-Liang Chen,et al.  Mining Nonambiguous Temporal Patterns for Interval-Based Events , 2007, IEEE Transactions on Knowledge and Data Engineering.

[74]  Jiawei Han,et al.  BIDE: efficient mining of frequent closed sequences , 2004, Proceedings. 20th International Conference on Data Engineering.

[75]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.