FPG-Grow: A Graph Based Pattern Grow Algorithm for Application Level IO Pattern Mining

The previous study of pattern discovery in storage systems focus on sequential pattern (SP) mining in lower level traces, but they don’t scale well to the application level. For patterns in application level are mostly composed of Contiguous Item Sequential Patterns (CISP) which are much simpler than SP, so it’s inefficient for the previous studies to mine CISP with clumsy SP mining algorithms. We propose a novel algorithm FPG-Grow which is more preferable for mining application level IO patterns. The FPG-Grow only scan the origin sequence in one-pass to construct a Frequent Pattern Graph (FPG), from which we can easily extract the CISPs by fetching the frequent sub-graphs with linear cost. Also we can do the verification efficiently by avoiding the origin sequence scan. Furthermore, the grow method will eliminate the information loss introduced by sequence cutting as C-Miner does. The experiment result shows that the FPG-Grow outperforms C-Miner prominently in mining with real application IO traces and the simulation result also proves the effectiveness of CISP in application IO optimizations.

[1]  Luís Cavique,et al.  A scalable algorithm for the market basket analysis , 2007 .

[2]  Scott A. Brandt,et al.  Performing file prediction with a program-based successor model , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Xiaoning Ding,et al.  DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch , 2007, USENIX Annual Technical Conference.

[4]  Luís Cavique,et al.  A Network Algorithm to Discover Sequential Patterns , 2007, EPIA Workshops.

[5]  Yuanyuan Zhou,et al.  Mining block correlations to improve storage performance , 2005, TOS.

[6]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[7]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[8]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[9]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[10]  Cláudia Antunes,et al.  Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints , 2003, MLDM.

[11]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[12]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[13]  Yin-Fu Huang,et al.  Mining sequential patterns using graph search techniques , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[14]  Cláudia Antunes,et al.  Sequential Pattern Mining Algorithms: Trade-offs between Speed and Memory , 2004 .

[15]  Jinlin Chen Contiguous item sequential pattern mining using UpDown Tree , 2008, Intell. Data Anal..

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[17]  Hiroki Arimura,et al.  A Polynomial Space and Polynomial Delay Algorithm for Enumeration of Maximal Motifs in a Sequence , 2005, ISAAC.

[18]  Marie-France Sagot,et al.  Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification , 2000, RECOMB '00.

[19]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[20]  Stefan Kurtz,et al.  Reducing the space requirement of suffix trees , 1999 .

[21]  Jinlin Chen,et al.  Mining contiguous sequential patterns from web logs , 2007, WWW '07.