Mining of high utility-probability sequential patterns from uncertain databases

High-utility sequential pattern mining (HUSPM) has become an important issue in the field of data mining. Several HUSPM algorithms have been designed to mine high-utility sequential patterns (HUPSPs). They have been applied in several real-life situations such as for consumer behavior analysis and event detection in sensor networks. Nonetheless, most studies on HUSPM have focused on mining HUPSPs in precise data. But in real-life, uncertainty is an important factor as data is collected using various types of sensors that are more or less accurate. Hence, data collected in a real-life database can be annotated with existing probabilities. This paper presents a novel pattern mining framework called high utility-probability sequential pattern mining (HUPSPM) for mining high utility-probability sequential patterns (HUPSPs) in uncertain sequence databases. A baseline algorithm with three optional pruning strategies is presented to mine HUPSPs. Moroever, to speed up the mining process, a projection mechanism is designed to create a database projection for each processed sequence, which is smaller than the original database. Thus, the number of unpromising candidates can be greatly reduced, as well as the execution time for mining HUPSPs. Substantial experiments both on real-life and synthetic datasets show that the designed algorithm performs well in terms of runtime, number of candidates, memory usage, and scalability for different minimum utility and minimum probability thresholds.

[1]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[2]  Heungmo Ryang,et al.  High utility pattern mining over data streams with sliding window technique , 2016, Expert Syst. Appl..

[3]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[4]  Tzung-Pei Hong,et al.  A new mining approach for uncertain databases using CUFP trees , 2012, Expert Syst. Appl..

[5]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[6]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[7]  Keun Ho Ryu,et al.  High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[8]  Philip S. Yu,et al.  Mining Frequent Itemsets over Uncertain Databases , 2012, Proc. VLDB Endow..

[9]  Keun Ho Ryu,et al.  Fast algorithm for high utility pattern mining with the sum of item quantities , 2016, Intell. Data Anal..

[10]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[11]  Antonio Gomariz,et al.  The SPMF Open-Source Data Mining Library Version 2 , 2016, ECML/PKDD.

[12]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[13]  Reynold Cheng,et al.  Mining uncertain data with probabilistic guarantees , 2010, KDD.

[14]  Pinar Senkul,et al.  CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Jiawei Han,et al.  TFP: an efficient algorithm for mining top-k frequent closed itemsets , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Unil Yun,et al.  A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives , 2017, Future Gener. Comput. Syst..

[19]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[20]  Hamido Fujita,et al.  An efficient algorithm for mining high utility patterns from incremental databases with one database scan , 2017, Knowl. Based Syst..

[21]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[22]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Carson Kai-Sang Leung,et al.  A Tree-Based Approach for Frequent Pattern Mining from Uncertain Data , 2008, PAKDD.

[24]  Tzung-Pei Hong,et al.  An effective tree structure for mining high utility itemsets , 2011, Expert Syst. Appl..

[25]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[26]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[27]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[28]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[29]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[30]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[31]  Muhammad Muzammal,et al.  Mining sequential patterns from probabilistic data , 2012 .

[32]  Heungmo Ryang,et al.  An uncertainty-based approach: Frequent itemset mining from uncertain data with different item importance , 2015, Knowl. Based Syst..

[33]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[34]  Byeong-Soo Jeong,et al.  Mining High Utility Web Access Sequences in Dynamic Web Log Data , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[35]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[36]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[37]  Rajeev Raman,et al.  Mining sequential patterns from probabilistic databases , 2011, Knowledge and Information Systems.

[38]  Pinar Senkul,et al.  CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction , 2015, IEEE Trans. Knowl. Data Eng..

[39]  Wilfred Ng,et al.  Mining Probabilistically Frequent Sequential Patterns in Large Uncertain Databases , 2014, IEEE Transactions on Knowledge and Data Engineering.

[40]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.