DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud

The progressive sequential pattern mining problem has been discussed in previous research works With the increasing amount of data, single processors struggle to scale up Traditional algorithms running on a single machine may have scalability troubles Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.

[1]  Ruben H. Zamar,et al.  Parallel Computation of High-Dimensional Robust Correlation and Covariance Matrices , 2006, Algorithmica.

[2]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[3]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4]  Maria E. Orlowska,et al.  Improvements of IncSpan: Incremental Mining of Sequential Patterns in Large Database , 2005, PAKDD.

[5]  Kun Liu,et al.  Multi-party, Privacy-Preserving Distributed Data Mining Using a Game Theoretic Framework , 2007, PKDD.

[6]  Hui Xiong,et al.  Distributed classification in peer-to-peer networks , 2007, KDD '07.

[7]  Ran Wolff,et al.  A Generic Local Algorithm for Mining Data Streams in Large Distributed Systems , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Pang-Ning Tan,et al.  Recommendation via Query Centered Random Walk on K-Partite Graph , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Emmanuel Viennet,et al.  bitSPADE: A Lattice-based Sequential Pattern Mining Algorithm Using Bitmap Representation , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Ming-Syan Chen,et al.  A General Model for Sequential Pattern Mining with a Progressive Database , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.