Mining top-k frequent patterns from uncertain databases

Mining uncertain frequent patterns (UFPs) from uncertain databases was recently introduced, and there are various approaches to solve this problem in the last decade. However, systems are often faced with the problem of too many UFPs being discovered by the traditional approaches to this issue, and thus will spend a lot of time and resources to rank and find the most promising patterns. Therefore, this paper introduces a task named mining top-k UFPs from uncertain databases. We then propose an efficient method named TUFP (mining Top-k UFPs) to carry this out. Effective threshold raising strategies are introduced to help the proposed algorithm reduce the number of generated candidates to enhance the performance in terms of the runtime as well as memory usage. Finally, several experiments on the number of generated candidates, mining time, memory usage and scalability of TUFP and two state-of-the-art approaches (CUFP-mine and LUNA) were conducted. The performance studies show that TUFP is efficient in terms of mining time, memory usage and scalability for mining top-k UFPs.

[1]  Carson Kai-Sang Leung,et al.  Mining interesting patterns from uncertain databases , 2016, Inf. Sci..

[2]  Sung Wook Baik,et al.  A Hybrid Approach Using Oversampling Technique and Cost-Sensitive Learning for Bankruptcy Prediction , 2019, Complex..

[3]  Heungmo Ryang,et al.  An uncertainty-based approach: Frequent itemset mining from uncertain data with different item importance , 2015, Knowl. Based Syst..

[4]  Ying-Ho Liu,et al.  Mining time-interval univariate uncertain sequential patterns , 2015, Data Knowl. Eng..

[5]  Philippe Fournier-Viger,et al.  An efficient algorithm for mining top-rank-k frequent patterns , 2015, Applied Intelligence.

[6]  Van-Nam Huynh,et al.  Mining closed high utility itemsets in uncertain databases , 2016, SoICT.

[7]  Tzung-Pei Hong,et al.  Mining frequent itemsets using the N-list and subsume concepts , 2014, Int. J. Mach. Learn. Cybern..

[8]  Bay Vo,et al.  An efficient and effective algorithm for mining top-rank-k frequent patterns , 2015, Expert Syst. Appl..

[9]  Sung Wook Baik,et al.  A Robust Framework for Self-Care Problem Identification for Children with Disability , 2019, Symmetry.

[10]  Inés Couso,et al.  Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data , 2015, Eng. Appl. Artif. Intell..

[11]  Philip S. Yu,et al.  Efficient Algorithms for Mining Top-K High Utility Itemsets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[12]  Sung Wook Baik,et al.  A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset , 2018, Symmetry.

[13]  Hamido Fujita,et al.  Efficient algorithms to identify periodic patterns in multiple sequences , 2019, Inf. Sci..

[14]  Unil Yun,et al.  A new efficient approach for mining uncertain frequent patterns using minimum data structure without false positives , 2017, Future Gener. Comput. Syst..

[15]  Heungmo Ryang,et al.  Top-k high utility pattern mining with effective threshold raising strategies , 2015, Knowl. Based Syst..

[16]  Ganapati Panda,et al.  Design of computationally efficient density-based clustering algorithms , 2015, Data Knowl. Eng..

[17]  Ngoc Thanh Nguyen,et al.  A method for mining top-rank-k frequent closed itemsets , 2017, J. Intell. Fuzzy Syst..

[18]  N. T,et al.  Frequent Pattern Mining in Big Data , 2015 .

[19]  Bay Vo,et al.  A novel approach for mining maximal frequent patterns , 2017, Expert Syst. Appl..

[20]  AhmedChowdhury Farhan,et al.  Mining interesting patterns from uncertain databases , 2016 .

[21]  Nitin Indurkhya,et al.  Emerging directions in predictive text mining , 2015, WIREs Data Mining Knowl. Discov..

[22]  Zhi-Hong Deng,et al.  PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[23]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[24]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[25]  Tao Li,et al.  Skopus: Mining top-k sequential patterns under leverage , 2015, Data Mining and Knowledge Discovery.

[26]  Sung Wook Baik,et al.  Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept , 2018, Eng. Appl. Artif. Intell..

[27]  Ngoc Thanh Nguyen,et al.  A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting , 2019, Inf. Sci..

[28]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[29]  Bingru Yang,et al.  Index-BitTableFI: An improved algorithm for mining frequent itemsets , 2008, Knowl. Based Syst..

[30]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[32]  Zhi-Hong Deng,et al.  Mining Top‐Rank‐k Erasable Itemsets by PID_lists , 2013, Int. J. Intell. Syst..

[33]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[34]  Zhi-Hong Deng,et al.  DiffNodesets: An efficient structure for fast mining frequent itemsets , 2015, Appl. Soft Comput..

[35]  Hamido Fujita,et al.  Damped window based high average utility pattern mining over data streams , 2017, Knowl. Based Syst..

[36]  Logesh Ravi,et al.  Data mining‐based tag recommendation system: an overview , 2015, WIREs Data Mining Knowl. Discov..

[37]  Philippe Fournier-Viger,et al.  Extracting useful knowledge from event logs: A frequent itemset mining approach , 2018, Knowl. Based Syst..

[38]  Hanieh Fasihy,et al.  Incremental mining maximal frequent patterns from univariate uncertain data , 2018, Knowl. Based Syst..

[39]  Hamido Fujita,et al.  Extracting Non-redundant Correlated Purchase Behaviors by Utility Measure , 2017, DaWaK.

[40]  Miguel Molina-Solana,et al.  Information fusion from multiple databases using meta-association rules , 2017, Int. J. Approx. Reason..

[41]  Vikram Goyal,et al.  Mining top-k high-utility itemsets from a data stream under sliding window model , 2017, Applied Intelligence.

[42]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[43]  YunUnil,et al.  An uncertainty-based approach , 2015 .

[44]  Tzung-Pei Hong,et al.  Weighted frequent itemset mining over uncertain databases , 2015, Applied Intelligence.

[45]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[46]  Zhi-Hong Deng,et al.  Fast mining Top-Rank-k frequent patterns by using Node-lists , 2014, Expert Syst. Appl..

[47]  Ashok Kumar Das,et al.  An effective association rule mining scheme using a new generic basis , 2014, Knowledge and Information Systems.

[48]  Jie Dong,et al.  BitTableFI: An efficient mining frequent itemsets algorithm , 2007, Knowl. Based Syst..

[49]  Sung Wook Baik,et al.  Oversampling Techniques for Bankruptcy Prediction: Novel Features from a Transaction Dataset , 2018, Symmetry.

[50]  Bay Vo,et al.  The lattice‐based approaches for mining association rules: a review , 2016, WIREs Data Mining Knowl. Discov..

[51]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[52]  Philippe Fournier-Viger,et al.  An efficient algorithm for mining the top-k high utility itemsets, using novel threshold raising and pruning strategies , 2016, Knowl. Based Syst..