An interdisciplinary comparison of sequence modeling methods for next-element prediction

Data of sequential nature arise in many application domains in the form of, e.g., textual data, DNA sequences, and software execution traces. Different research disciplines have developed methods to learn sequence models from such datasets: (i) In the machine learning field methods such as (hidden) Markov models and recurrent neural networks have been developed and successfully applied to a wide range of tasks, (ii) in process mining process discovery methods aim to generate human-interpretable descriptive models, and (iii) in the grammar inference field the focus is on finding descriptive models in the form of formal grammars. Despite their different focuses, these fields share a common goal: learning a model that accurately captures the sequential behavior in the underlying data. Those sequence models are generative , i.e., they are able to predict what elements are likely to occur after a given incomplete sequence. So far, these fields have developed mainly in isolation from each other and no comparison exists. This paper presents an interdisciplinary experimental evaluation that compares sequence modeling methods on the task of next-element prediction on four real-life sequence datasets. The results indicate that machine learning methods, which generally do not aim at model interpretability, tend to outperform methods from the process mining and grammar inference fields in terms of accuracy.

[1]  Massimiliano de Leoni,et al.  Constructing Probable Explanations of Nonconformity: A Data-Aware and History-Based Approach , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[2]  Donato Malerba,et al.  Process Mining to Forecast the Future of Running Cases , 2013, NFMCP.

[3]  Josep Carmona,et al.  Summary of the Process Discovery Contest 2016 , 2017 .

[4]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs Containing Infrequent Behaviour , 2013, Business Process Management Workshops.

[5]  Rémi Eyraud,et al.  Scikit-SpLearn : a toolbox for the spectral learning of weighted automata compatible with scikit-learn , 2017 .

[6]  Wil M. P. van der Aalst,et al.  Recursion aware modeling and discovery for hierarchical software event log analysis , 2017, 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[7]  Yurdaer N. Doganata,et al.  Leveraging path information to generate predictions for parallel business processes , 2015, Knowledge and Information Systems.

[8]  M. Nielsen,et al.  Decidability Issues for Petri Nets , 1994 .

[9]  Moe Thandar Wynn,et al.  Predicting Deadline Transgressions Using Event Logs , 2012, Business Process Management Workshops.

[10]  Tadao Murata,et al.  Petri nets: Properties, analysis and applications , 1989, Proc. IEEE.

[11]  E. Mark Gold,et al.  Complexity of Automaton Identification from Given Data , 1978, Inf. Control..

[12]  Mathias Weske,et al.  Prediction of Remaining Service Execution Time Using Stochastic Petri Nets with Arbitrary Firing Delays , 2013, ICSOC.

[13]  Boudewijn F. van Dongen,et al.  Discovering workflow nets using integer linear programming , 2017, Computing.

[14]  Alessandro Berti,et al.  Process Mining for Python (PM4Py): Bridging the Gap Between Process- and Data Science , 2019, ArXiv.

[15]  AngluinDana Learning regular sets from queries and counterexamples , 1987 .

[16]  Peter Fettke,et al.  A Multi-stage Deep Learning Approach for Business Process Event Prediction , 2017, 2017 IEEE 19th Conference on Business Informatics (CBI).

[17]  Tzu-Tsung Wong,et al.  Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation , 2015, Pattern Recognit..

[18]  Colin de la Higuera,et al.  PAutomaC: a probabilistic automata and hidden Markov models learning competition , 2013, Machine Learning.

[19]  Gadiel Seroussi,et al.  Sequential prediction and ranking in universal context modeling and data compression , 1997, IEEE Trans. Inf. Theory.

[20]  Wil M. P. van der Aalst,et al.  Time prediction based on process mining , 2011, Inf. Syst..

[21]  Boudewijn F. van Dongen,et al.  Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity , 2014, Int. J. Cooperative Inf. Syst..

[22]  Vincent S. Tseng,et al.  CPT+: Decreasing the Time/Space Complexity of the Compact Prediction Tree , 2015, PAKDD.

[23]  Ted E. Dunning,et al.  Statistical Identification of Language , 1994 .

[24]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[25]  Niklaus Wirth,et al.  What can we do about the unnecessary diversity of notation for syntactic definitions? , 1977, Commun. ACM.

[26]  Mathias Weske,et al.  Discovering Stochastic Petri Nets with Arbitrary Delay Distributions from Event Logs , 2013, Business Process Management Workshops.

[27]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[28]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[29]  Diane J. Cook,et al.  Online Sequential Prediction via Incremental Parsing: The Active LeZi Algorithm , 2007, IEEE Intelligent Systems.

[30]  Ariadna Quattoni,et al.  Results of the Sequence PredIction ChallengE (SPiCe): a Competition on Learning the Next Symbol in a Sequence , 2016, ICGI.

[31]  Wil M. P. van der Aalst,et al.  LocalProcessModelDiscovery: Bringing Petri Nets to the Pattern Mining World , 2018, Petri Nets.

[32]  Boudewijn F. van Dongen,et al.  Replaying history on process models for conformance checking and performance analysis , 2012, WIREs Data Mining Knowl. Discov..

[33]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[34]  Jan Mendling,et al.  Business Process Model and Notation , 2012, Lecture Notes in Business Information Processing.

[35]  Massimiliano de Leoni,et al.  History-based Construction of Log-Process Alignments for Conformance Checking: Discovering What Really Went Wrong , 2014, SIMPDA.

[36]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[37]  Jana-Rebecca Rehse,et al.  Predicting process behaviour using deep learning , 2016, Decis. Support Syst..

[38]  Boudewijn F. van Dongen,et al.  Tuning Alignment Computation: An Experimental Evaluation , 2017, ATAED@Petri Nets/ACSD.

[39]  Felix Mannhardt,et al.  Analyzing the Trajectories of Patients with Sepsis using Process Mining , 2017, RADAR+EMISA@CAiSE.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[42]  Miguel Toro,et al.  Run-time prediction of business process indicators using evolutionary decision rules , 2017, Expert Syst. Appl..

[43]  W.M.P. van der Aalst,et al.  Supporting Flexible Processes Through Log-Based Recommendations , 2008, BPM 2008.

[44]  A. J. M. M. Weijters,et al.  Flexible Heuristics Miner (FHM) , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[45]  Vincent S. Tseng,et al.  Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction , 2013, ADMA.

[46]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[47]  Beth Logan,et al.  Music summarization using key phrases , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[48]  Yoshua Bengio,et al.  Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription , 2012, ICML.

[49]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[50]  Boudewijn F. van Dongen,et al.  Cycle Time Prediction: When Will This Case Finally Be Finished? , 2008, OTM Conferences.

[51]  Wil M. P. van der Aalst,et al.  Mining local process models , 2016, J. Innov. Digit. Ecosyst..

[52]  Yurdaer N. Doganata,et al.  A markov prediction model for data-driven semi-structured business processes , 2013, Knowledge and Information Systems.

[53]  Jörg Becker,et al.  Comprehensible Predictive Models for Business Processes , 2016, MIS Q..

[54]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[55]  Peter Pirolli,et al.  Mining Longest Repeating Subsequences to Predict World Wide Web Surfing , 1999, USENIX Symposium on Internet Technologies and Systems.

[56]  Mathias Weske,et al.  Prediction of business process durations using non-Markovian stochastic Petri nets , 2015, Inf. Syst..

[57]  Josep Carmona,et al.  A Unified Approach for Measuring Precision and Generalization Based on Anti-alignments , 2016, BPM.

[58]  Marlon Dumas,et al.  Split Miner: Discovering Accurate and Simple Business Process Models from Event Logs , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[59]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[60]  A Arya Adriansyah,et al.  Aligning observed and modeled behavior , 2014 .

[61]  Massimo Mecella,et al.  Automated Discovery of Process Models from Event Logs: Review and Benchmark , 2017, IEEE Transactions on Knowledge and Data Engineering.

[62]  Boudewijn F. van Dongen,et al.  Process mining: a two-step approach to balance between underfitting and overfitting , 2008, Software & Systems Modeling.

[63]  QuattoniAriadna,et al.  Spectral learning of weighted automata , 2014 .

[64]  Boudewijn F. van Dongen,et al.  Supporting Flexible Processes through Recommendations Based on History , 2008, BPM.

[65]  Marlon Dumas,et al.  Predictive Business Process Monitoring with LSTM Neural Networks , 2016, CAiSE.

[66]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[67]  Dana Angluin,et al.  Learning Regular Sets from Queries and Counterexamples , 1987, Inf. Comput..

[68]  Maurice van Keulen,et al.  Process Prediction in Noisy Data Sets: A Case Study in a Dutch Hospital , 2012, SIMPDA.

[69]  Wil M. P. van der Aalst,et al.  Guided Interaction Exploration in Artifact-centric Process Models , 2017, 2017 IEEE 19th Conference on Business Informatics (CBI).

[70]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[71]  Bart Baesens,et al.  Determining Process Model Precision and Generalization with Weighted Artificial Negative Events , 2014, IEEE Transactions on Knowledge and Data Engineering.

[72]  Colin de la Higuera,et al.  A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[73]  Ariadna Quattoni,et al.  Spectral learning of weighted automata , 2014, Machine Learning.

[74]  Philipp Koehn,et al.  Proceedings of the Workshop on Statistical Machine Translation , 2006 .

[75]  Sander J. J. Leemans,et al.  Indulpet Miner: Combining Discovery Algorithms , 2018, OTM Conferences.

[76]  Barak A. Pearlmutter,et al.  Results of the Abbadingo One DFA Learning Competition and a New Evidence-Driven State Merging Algorithm , 1998, ICGI.

[77]  Sander J. J. Leemans,et al.  Discovering Block-Structured Process Models from Event Logs - A Constructive Approach , 2013, Petri Nets.

[78]  Andreas Solti,et al.  Automatic Root Cause Identification Using Most Probable Alignments , 2017, Business Process Management Workshops.

[79]  Boudewijn F. van Dongen,et al.  Discovering Relaxed Sound Workflow Nets using Integer Linear Programming , 2017, ArXiv.

[80]  Fabrizio Maria Maggi,et al.  Clustering-Based Predictive Process Monitoring , 2015, IEEE Transactions on Services Computing.

[81]  Fabrizio Maria Maggi,et al.  Predictive Business Process Monitoring with Structured and Unstructured Data , 2016, BPM.

[82]  Boudewijn F. van Dongen,et al.  On the Role of Fitness, Precision, Generalization and Simplicity in Process Discovery , 2012, OTM Conferences.

[83]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[84]  Michelangelo Ceci,et al.  Completion Time and Next Activity Prediction of Processes Using Sequential Pattern Mining , 2014, Discovery Science.

[85]  C. Humby,et al.  Process Mining: Data science in Action , 2014 .

[86]  Marcello La Rosa,et al.  Filtering Spurious Events from Event Streams of Business Processes , 2018, CAiSE.

[87]  Marco Ajmone Marsan,et al.  A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems , 1984, TOCS.

[88]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[89]  Alexander Clark,et al.  Learning deterministic context free grammars: The Omphalos competition , 2006, Machine Learning.

[90]  Paul A. Gagniuc,et al.  Markov Chains: From Theory to Implementation and Experimentation , 2017 .

[91]  Remco M. Dijkman,et al.  Business Process Model and Notation - Third International Workshop, BPMN 2011, Lucerne, Switzerland, November 21-22, 2011. Proceedings , 2011, Business Process Modeling Notation.

[92]  Irene Teinemaa,et al.  An Experimental Evaluation of the Generalizing Capabilities of Process Discovery Techniques and Black-Box Sequence Models , 2018, BPMDS/EMMSAD@CAiSE.

[93]  Boudewijn F. van Dongen,et al.  Process Discovery using Integer Linear Programming , 2009, Fundam. Informaticae.