Sequential Pattern Mining (SPM) for user-inputted data sets: an empirical framework using bitwise operations

Sequential pattern mining is used to discover temporal relationships between item sets within a large data set. The downside of these approaches is the computation time and memory requirement, which increase exponentially with the data set size. We propose a new algorithm for sequential pattern mining using Apriori-based frequent itemset. In this work, a whole transaction is represented using binary number. The main advantage of the proposed method is in eliminating the necessity to scan the whole data set, for every new set of transactions, which is the limitation in existing sequential pattern mining algorithms. The result of the proposed method is analysed, which shows that the proposed algorithm provides support for large data set analysis, taking care of both execution time and memory usage. Also, we have proposed a pilot approach on how the proposed sequential pattern mining algorithm would work in a parallel environment.

[1]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[2]  S. Durga Bhavani,et al.  Rare association rule mining for data stream , 2014, International Conference on Computing and Communication Technologies.

[3]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4]  Vijay V. Raghavan,et al.  Min-Max Itemset Trees for Dense and Categorical Datasets , 2012, ISMIS.

[5]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[6]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[7]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[8]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[9]  Yan Song,et al.  Research of fire predicting model based association rule data mining , 2011, Proceedings of International Conference on Information Systems for Crisis Response and Management (ISCRAM).

[10]  Seyed Mostafa Fakhrahmad,et al.  An Efficient Frequent Pattern Mining Method and its Parallelization in Transactional Databases , 2011, J. Inf. Sci. Eng..

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[13]  Jiadong Ren,et al.  The design of storage structure for sequence in incremental sequential patterns mining , 2010, The 6th International Conference on Networked Computing and Advanced Information Management.

[14]  Vijay V. Raghavan,et al.  DynTARM: An In-Memory Data Structure for Targeted Strong and Rare Association Rule Mining over Time-Varying Domains , 2013, 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[15]  Ferenc Bodon,et al.  Surprising Results of Trie-based FIM Algorithms , 2004, FIMI.

[16]  Sanjay Ranka,et al.  An Efficient Algorithm for the Incremental Updation of Association Rules in Large Databases , 1997, KDD.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Gregory Piatetsky-Shapiro,et al.  Discovery, Analysis, and Presentation of Strong Rules , 1991, Knowledge Discovery in Databases.

[19]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[20]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[21]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[22]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[23]  Bart Goethals,et al.  Advances in Frequent Itemset Mining Implementations: Introduction to FIMI03 , 2003, FIMI.

[24]  Weimin Ouyang,et al.  Mining Positive and Negative Association Rules in Data Streams with a Sliding Window , 2013, 2013 Fourth Global Congress on Intelligent Systems.

[25]  Rafail Ostrovsky,et al.  Optimal sampling from sliding windows , 2012, J. Comput. Syst. Sci..

[26]  Umeshwar Dayal,et al.  PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth , 2001, ICDE 2001.

[27]  Thi Thanh Nhan Le,et al.  BitApriori: An Apriori-Based Frequent Itemsets Mining Using Bit Streams , 2010, 2010 International Conference on Information Science and Applications.

[28]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.