Frequent Patterns that Compress

One of the major problems in frequent pattern mining is the explosion of the number of results, making it difficult to identify the interesting frequent patterns. In a recent paper [14] we have shown that an MDL-based approach gives a dramatic reduction of the number of frequent item sets to consider. Here we show that MDL gives similarly good reductions for frequent patterns on other types of data, viz., on sequences and trees. Reductions of two to three orders of magnitude are easily attained on data sets from the webmining field.

[1]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[2]  Mohammed J. Zaki,et al.  Theoretical Foundations of Association Rules , 2007 .

[3]  Hiroki Arimura,et al.  Optimized Substructure Discovery for Semi-structured Data , 2002, PKDD.

[4]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[5]  Diane J. Cook,et al.  MINING TEMPORAL SEQUENCES TO DISCOVER INTERESTING PATTERNS , 2004 .

[6]  J. D. Knijf Monotone Constraints in Frequent Tree Mining , 2022 .

[7]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[8]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[9]  A. Akhmetova Discovery of Frequent Episodes in Event Sequences , 2006 .

[10]  Yun Chi,et al.  Frequent Subtree Mining - An Overview , 2004, Fundam. Informaticae.

[11]  Mohammed J. Zaki Efficiently mining frequent trees in a forest: algorithms and applications , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Lawrence B. Holder,et al.  Structure Discovery from Sequential Data , 2004, FLAIRS.

[13]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[14]  Cédric Chauve,et al.  Tree Pattern Matching for Linear Static Terms , 2002, SPIRE.

[15]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 1997, Texts in Computer Science.

[16]  Carla E. Brodley,et al.  KDD-Cup 2000 organizers' report: peeling the onion , 2000, SKDD.

[17]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .