论文信息 - A New Constraint for Mining Sets in Sequences

A New Constraint for Mining Sets in Sequences

Discovering interesting patterns in event sequences is a popular task in the field of data mining. Most existing methods try to do this based on some measure of cohesion to determine an occurrence of a pattern, and a frequency threshold to determine if the pattern occurs often enough. We introduce a new constraint based on a new interestingness measure combining the cohesion and the frequency of a pattern. For a dataset consisting of a single sequence, the cohesion is measured as the average length of the smallest intervals containing the pattern for each occurrence of its events, and the frequency is measured as the probability of observing an event of that pattern. We present a similar constraint for datasets consisting of multiple sequences. We present algorithms to efficiently identify the thus defined interesting patterns, given a dataset and a user-defined threshold. After applying our method to both synthetic and real-life data, we conclude that it indeed gives intuitive results in a number of applications.

[1] G. Wu,et al. Frequency and Markov chain analysis of the amino-acid sequence of human alcohol dehydrogenase alpha-chain. , 2000, Alcohol and alcoholism.

[2] Heikki Mannila,et al. Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[3] Alain Gély. A Generic Algorithm for Generating Closed Sets of a Binary Relation , 2005, ICFCA.

[4] Gemma C. Garriga. Discovering Unbounded Episodes in Sequential Data , 2003, PKDD.

[5] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6] Heikki Mannila,et al. Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[7] Jean-François Boulicaut,et al. Data Peeler: Contraint-Based Closed Pattern Mining in n-ary Relations , 2008, SDM.

[8] Kaizhong Zhang,et al. Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[9] Heikki Mannila,et al. Discovering Frequent Episodes in Sequences , 1995, KDD.

[10] Jean-François Boulicaut,et al. Constraint-based concept mining and its application to microarray data analysis , 2005, Intell. Data Anal..