Mining cost-effective patterns in event logs

Abstract High Utility Pattern Mining is a popular task for analyzing data. It consists of discovering patterns having a high importance in databases. A popular application of high utility pattern mining is to identify high utility (profitable) patterns in customer transaction data. Though such analysis can be useful to understand data, it does not consider the cost (e.g. effort, resources, money or time) required for obtaining the utility (benefits). In this paper, we argue that to discover interesting patterns in event sequences, it is useful to consider both a utility model and a cost model. For example, to identify cost-effective ways of treating patients from medical pathways data, it is desirable to consider not only the ability of treatments to inhibit symptoms or cure a disease (utility) but also the resources consumed and the time spent (cost) to provide these treatments. Based on this perspective, this paper defines a novel task of discovering Cost-Effective Event Sequences in event logs. In this task, cost is modeled as numeric values, while utility is represented either as binary or numeric values. Measures are proposed to evaluate the trade-off and correlation between cost and utility of patterns to identify cost-effective patterns (patterns having a low cost but providing a high utility). Three efficient algorithms called CEPB, corCEPB and CEPN are designed to extract these patterns. They rely on a tight lower-bound on the cost and a memory buffering technique to find patterns efficiently. Experiments show that the proposed algorithms achieve high efficiency, that proposed optimizations improve efficiency, and that insightful cost-effective patterns are found in real-life e-learning data.

[1]  Aijun An,et al.  Mining significant high utility gene regulation sequential patterns , 2017, BMC Systems Biology.

[2]  Sylvie Norre,et al.  TWINCLE : A Constrained Sequential Rule Mining Algorithm for Event Logs , 2017, KES.

[3]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[4]  Harco Leslie Hendric Spits Warnars,et al.  Survey of emerging patterns , 2017, 2017 IEEE International Conference on Cybernetics and Computational Intelligence (CyberneticsCom).

[5]  Philippe Fournier-Viger,et al.  A Survey of High Utility Sequential Pattern Mining , 2019, Studies in Big Data.

[6]  Jerry Chun-Wei Lin,et al.  A Survey of High Utility Itemset Mining , 2019, Studies in Big Data.

[7]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[8]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[9]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[10]  Wil M. P. van der Aalst,et al.  Subgroup Discovery in Process Mining , 2017, BIS.

[11]  Byeong-Soo Jeong,et al.  A Novel Approach for Mining High‐Utility Sequential Patterns in Sequence Databases , 2010 .

[12]  Daniele Apiletti,et al.  BAC: A Bagged Associative Classifier for Big Data Frameworks , 2016, ADBIS.

[13]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[14]  Pinar Senkul,et al.  CRoM and HuspExt: Improving Efficiency of High Utility Sequential Pattern Extraction , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Manuel Campos,et al.  Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information , 2014, PAKDD.

[16]  Heungmo Ryang,et al.  High utility pattern mining over data streams with sliding window technique , 2016, Expert Syst. Appl..

[17]  Padhraic Smyth,et al.  Knowledge Discovery and Data Mining: Towards a Unifying Framework , 1996, KDD.

[18]  Wil M. P. van der Aalst,et al.  Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions , 2012, ProHealth/KR4HC.

[19]  Felix Mannhardt,et al.  Analyzing the Trajectories of Patients with Sepsis using Process Mining , 2017, RADAR+EMISA@CAiSE.

[20]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[21]  Philippe Fournier-Viger,et al.  A survey of itemset mining , 2017, WIREs Data Mining Knowl. Discov..

[22]  Vincent S. Tseng,et al.  Efficient Mining of High-Utility Sequential Rules , 2015, MLDM.

[23]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[24]  Siu-Cheung Kong,et al.  Mining Sequential Patterns of Students' Access on Learning Management System , 2017, DMBD.

[25]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[26]  G. Glass,et al.  Statistical methods in education and psychology , 1970 .

[27]  Xiangliang Zhang,et al.  An up-to-date comparison of state-of-the-art classification algorithms , 2017, Expert Syst. Appl..

[28]  Jayanthi Ranjan,et al.  Effective educational process: a data‐mining approach , 2007 .

[29]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[30]  Vincent S. Tseng,et al.  Mining Maximal Sequential Patterns without Candidate Maintenance , 2013, ADMA.

[31]  Heri Ramampiaro,et al.  Efficient high utility itemset mining using buffered utility-lists , 2017, Applied Intelligence.

[32]  Philip S. Yu,et al.  CoUPM: Correlated Utility-based Pattern Mining , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[33]  Antonio Gomariz,et al.  VMSP: Efficient Vertical Mining of Maximal Sequential Patterns , 2014, Canadian Conference on AI.

[34]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[35]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[36]  Hamido Fujita,et al.  Efficient algorithms to identify periodic patterns in multiple sequences , 2019, Inf. Sci..

[37]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[38]  Lior Rokach,et al.  Introduction to Knowledge Discovery and Data Mining , 2010, Data Mining and Knowledge Discovery Handbook.

[39]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[40]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[41]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[42]  Charu C. Aggarwal,et al.  Data Mining: The Textbook , 2015 .

[43]  Philippe Fournier-Viger,et al.  FCloSM, FGenSM: two efficient algorithms for mining frequent closed and generator sequences using the local pruning strategy , 2017, Knowledge and Information Systems.

[44]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[45]  Yun Sing Koh,et al.  Mining local and peak high utility itemsets , 2019, Inf. Sci..

[46]  José María Luna Pattern mining: current status and emerging topics , 2016, Progress in Artificial Intelligence.

[47]  Yi Yang,et al.  Diversified Temporal Subgraph Pattern Mining , 2016, KDD.

[48]  Cristóbal Romero,et al.  A survey on educational process mining , 2018, WIREs Data Mining Knowl. Discov..

[49]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[50]  Yi-Cheng Chen,et al.  On efficiently mining high utility sequential patterns , 2016, Knowledge and Information Systems.

[51]  Siegfried Nijssen,et al.  Supervised Pattern Mining and Applications to Classification , 2014, Frequent Pattern Mining.

[52]  Hamido Fujita,et al.  An efficient algorithm for mining high utility patterns from incremental databases with one database scan , 2017, Knowl. Based Syst..

[53]  Davide Anguita,et al.  A Learning Analytics Approach to Correlate the Academic Achievements of Students with Interaction Data from an Educational Simulator , 2015, EC-TEL.