TUSQ: Targeted High-Utility Sequence Querying

Significant efforts have been expended in the research and development of a database management system (DBMS) that has a wide range of applications for managing an enormous collection of multisource, heterogeneous, complex, or growing data. Besides the primary function (i.e., create, delete, and update), a practical and impeccable DBMS can interact with users through information selection, that is, querying with their targets. Previous querying algorithms, such as frequent itemset querying and sequential pattern querying (SPQ) have focused on the measurement of frequency, which does not involve the concept of utility, which is helpful for users to discover more informative patterns. To apply the querying technology for wider applications, we incorporate utility into target-oriented SPQ and formulate the task of targeted utility-oriented sequence querying. To address the proposed problem, we develop a novel algorithm, namely targeted high-utility sequence querying (TUSQ), based on two novel upper bounds suffix remain utility and terminated descendants utility as well as a vertical Last Instance Table structure. For further efficiency, TUSQ relies on a projection technology utilizing a compact data structure called the targeted chain. An extensive experimental study conducted on several real and synthetic datasets shows that the proposed algorithm outperformed the designed baseline algorithm in terms of runtime, memory consumption, and candidate filtering.

[1]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[2]  Lior Shabtay,et al.  A Guided FP-growth algorithm for multitude-targeted mining of big data , 2018, 1803.06632.

[3]  Philippe Fournier-Viger,et al.  MEIT: Memory Efficient Itemset Tree for Targeted Association Rule Mining , 2013, ADMA.

[4]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[5]  Philip S. Yu,et al.  TKUS: Mining Top-K High-Utility Sequential Patterns , 2020, Inf. Sci..

[6]  Licong Cui,et al.  Query-constraint-based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource , 2018, BMC Medical Informatics and Decision Making.

[7]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[8]  Philippe Fournier-Viger,et al.  Efficient Algorithms for High Utility Itemset Mining Without Candidate Generation , 2019, Studies in Big Data.

[9]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[10]  Umeshwar Dayal,et al.  The architecture of an active database management system , 1989, SIGMOD '89.

[11]  Tzung-Pei Hong,et al.  A fuzzy AprioriTid mining algorithm with reduced computational time , 2004, Appl. Soft Comput..

[12]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[13]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14]  Philip S. Yu,et al.  A Survey of Utility-Oriented Pattern Mining , 2018, IEEE Transactions on Knowledge and Data Engineering.

[15]  Kavé Salamatian,et al.  Anomaly extraction in backbone networks using association rules , 2009, IMC '09.

[16]  Philip S. Yu,et al.  Fast Utility Mining on Sequence Data , 2020, IEEE Transactions on Cybernetics.

[17]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[18]  Tinne Tuytelaars,et al.  Effective Use of Frequent Itemset Mining for Image Classification , 2012, ECCV.

[19]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[20]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[21]  Vijay V. Raghavan,et al.  Itemset Trees for Targeted Association Querying , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Cheng-Jung Lin,et al.  Goal-oriented sequential pattern for network banking churn analysis , 2003, Expert Syst. Appl..

[23]  Philip S. Yu,et al.  Utility-Driven Mining of High Utility Episodes , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[24]  Philip S. Yu,et al.  Utility Mining Across Multi-Dimensional Sequences , 2019, ACM Trans. Knowl. Discov. Data.

[25]  Michael Stonebraker,et al.  The POSTGRES next generation database management system , 1991, CACM.

[26]  Hao-En Chueh,et al.  Mining Target-Oriented Sequential Patterns with Time-Intervals , 2010, ArXiv.

[27]  Licong Cui,et al.  Query-constraint-based association rule mining from diverse clinical datasets in the national sleep research resource , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[28]  Yi-Cheng Chen,et al.  On efficiently mining high utility sequential patterns , 2016, Knowledge and Information Systems.

[29]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[30]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[31]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[32]  Philip S. Yu,et al.  ProUM: Projection-based Utility Mining on Sequence Data , 2019, Inf. Sci..

[33]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[34]  Philip S. Yu,et al.  Privacy Preserving Utility Mining: A Survey , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[35]  Jiun-Long Huang,et al.  On Incremental High Utility Sequential Pattern Mining , 2018, ACM Trans. Intell. Syst. Technol..

[36]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[37]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[38]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[39]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[40]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[41]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[42]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[43]  Aijun An,et al.  Mining significant high utility gene regulation sequential patterns , 2017, BMC Systems Biology.

[44]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[45]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[46]  Amit Thakkar,et al.  Target Oriented Sequential Pattern Mining using Recency and Monetary Constraints , 2012 .