Rule Discovery in Labeled Sequential Data: Application to Game Analytics. (Découverte de règles pour séquences labélisées: application à l'analyse de données de jeux vidéos)

It is extremely useful to exploit labeled datasets not only to learn models and perform predictive analytics but also to improve our understanding of a domain and its available targeted classes. The subgroup discovery task has been considered for more than two decades. It concerns the discovery of rules covering sets of objects having interesting properties, e.g., they characterize a given target class. Though many subgroup discovery algorithms have been proposed for both transactional and numerical data, discovering rules within labeled sequential data has been much less studied. In that context, exhaustive exploration strategies can not be used for real-life applications and we have to look for heuristic approaches. In this thesis, we propose to apply bandit models and Monte Carlo Tree Search to explore the search space of possible rules using an exploration-exploitation trade-off, on different data types such as sequences of itemset or time series. For a given budget, they find a collection of top-k best rules in the search space w.r.t chosen quality measure. They require a light configuration and are independent from the quality measure used for pattern scoring. To the best of our knowledge, this is the first time that the Monte Carlo Tree Search framework has been exploited in a sequential data mining setting. We have conducted thorough and comprehensive evaluations of our algorithms on several datasets to illustrate their added-value, and we discuss their qualitative and quantitative results. To assess the added-value of one or our algorithms, we propose a use case of game analytics, more precisely Rocket League match analysis. Discovering interesting rules in sequences of actions performed by players and using them in a supervised classification model shows the efficiency and the relevance of our approach in the difficult and realistic context of high dimensional data. It supports the automatic discovery of skills and it can be used to create new game modes, to improve the ranking system, to help e-sport commentators, or to better analyse opponent teams, for example.

[1]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[2]  Chedy Raïssi,et al.  Anytime discovery of a diverse set of patterns with Monte Carlo tree search. (Découverte d'un ensemble diversifié de motifs avec la recherche arborescente de Monte Carlo) , 2017 .

[3]  María José del Jesús,et al.  NMEEF-SD: Non-dominated Multiobjective Evolutionary Algorithm for Extracting Fuzzy Rules in Subgroup Discovery , 2010, IEEE Transactions on Fuzzy Systems.

[4]  Murray Campbell,et al.  Deep Blue , 2002, Artif. Intell..

[5]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Chedy Raïssi,et al.  Towards bounding sequential patterns , 2011, KDD.

[7]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[8]  Mohammed J. Zaki,et al.  PlanMine: Predicting Plan Failures Using Sequence Mining , 1998, Artificial Intelligence Review.

[9]  Dimitrios Gunopulos,et al.  Indexing multi-dimensional time-series with support for multiple distance measures , 2003, KDD '03.

[10]  Loïc Cerf,et al.  Watch me playing, i am a professional: a first study on video game live streaming , 2012, WWW.

[11]  Jilles Vreeken,et al.  The long and the short of it: summarising event sequences with serial episodes , 2012, KDD.

[12]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[13]  Alexandre Termier,et al.  Dryade: a new approach for discovering closed frequent trees in heterogeneous tree databases , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[14]  Víctor Codocedo,et al.  What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[15]  Romain Mathonat,et al.  Actionable Subgroup Discovery and Urban Farm Optimization , 2020, IDA.

[16]  Mehdi Kaytoue-Uberall,et al.  FSSD - A Fast and Efficient Algorithm for Subgroup Set Discovery , 2019, 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[17]  Peter Lewis,et al.  MOVE ORDERING VS HEAVY PLAYOUTS : WHERE SHOULD HEURISTICS BE APPLIED IN MONTE CARLO GO ? , 2007 .

[18]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[19]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[20]  Nada Lavrac,et al.  Classification Rule Learning with APRIORI-C , 2001, EPIA.

[21]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[22]  Guillaume Bosc,et al.  A Pattern Mining Approach to Study Strategy Balance in RTS Games , 2017, IEEE Transactions on Computational Intelligence and AI in Games.

[23]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[24]  Jason Lines,et al.  Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles , 2015, IEEE Transactions on Knowledge and Data Engineering.

[25]  Teresa Bernarda Ludermir,et al.  A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data , 2017, Appl. Soft Comput..

[26]  Cheikh Talibouya Diop,et al.  Sequential pattern sampling with norm-based utility , 2019, Knowledge and Information Systems.

[27]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[28]  Branko Kavsek,et al.  APRIORI-SD: ADAPTING ASSOCIATION RULE LEARNING TO SUBGROUP DISCOVERY , 2006, IDA.

[29]  Dmitriy Fradkin,et al.  Under Consideration for Publication in Knowledge and Information Systems Mining Sequential Patterns for Classification , 2022 .

[30]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[31]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[32]  Aimene Belfodil,et al.  An Order Theoretic Point-of-view on Subgroup Discovery. (Sur la découverte de sous-groupes en utilisant la théorie de l'ordre) , 2019 .

[33]  Jean-François Boulicaut,et al.  Simplest Rules Characterizing Classes Generated by δ-Free Sets , 2003 .

[34]  Marc Boullé,et al.  A user parameter-free approach for mining robust sequential classification rules , 2017, Knowledge and Information Systems.

[35]  Thomas Guyet,et al.  NegPSpan: efficient extraction of negative sequential patterns with embedding constraints , 2018, Data Mining and Knowledge Discovery.

[36]  Víctor Codocedo,et al.  When cyberathletes conceal their game: Clustering confusion matrices to identify avatar aliases , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[37]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[38]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[39]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[40]  Alain Saas,et al.  Discovering playing patterns: Time series clustering of free-to-play game data , 2016, 2016 IEEE Conference on Computational Intelligence and Games (CIG).

[41]  Johannes Fürnkranz,et al.  On cognitive preferences and the plausibility of rule-based models , 2018, Machine Learning.

[42]  Georgiana Ifrim,et al.  Interpretable time series classification using linear models and multi-resolution multi-domain symbolic representations , 2019, Data Mining and Knowledge Discovery.

[43]  Jun Wu,et al.  Mining conditional discriminative sequential patterns , 2019, Inf. Sci..

[44]  George C. Runger,et al.  A time series forest for classification and feature extraction , 2013, Inf. Sci..

[45]  Eamonn J. Keogh,et al.  The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[46]  Anders Jonsson,et al.  Learning decision trees through Monte Carlo tree search: An empirical evaluation , 2020, WIREs Data Mining Knowl. Discov..

[47]  Eamonn J. Keogh,et al.  Extracting Optimal Performance from Dynamic Time Warping , 2016, KDD.

[48]  Nada Lavrac,et al.  Closed Sets for Labeled Data , 2006, PKDD.

[49]  Frank Puppe,et al.  SD-Map - A Fast Algorithm for Exhaustive Subgroup Discovery , 2006, PKDD.

[50]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[51]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Tom Minka,et al.  TrueSkillTM: A Bayesian Skill Rating System , 2006, NIPS.

[53]  Boris Cule,et al.  Pattern Based Sequence Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[54]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[55]  Daniel Paurat,et al.  Direct local pattern sampling by efficient two-step random procedures , 2011, KDD.

[56]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[57]  Mehdi Kaytoue-Uberall,et al.  Anytime Subgroup Discovery in Numerical Domains with Guarantees , 2018, ECML/PKDD.

[58]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[59]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[60]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[61]  D. Haussler,et al.  Boolean Feature Discovery in Empirical Learning , 1990, Machine Learning.

[62]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[63]  Martin Atzmüller,et al.  A Computational Framework for Interpretable Anomaly Detection and Classification of Multivariate Time Series with Application to Human Gait Data Analysis , 2019, KR4HC/ProHealth/TEAAM@AIME.

[64]  Peter A. Flach,et al.  Subgroup Discovery in Smart Electricity Meter Data , 2014, IEEE Transactions on Industrial Informatics.

[65]  Mehdi Kaytoue-Uberall,et al.  Découverte de sous-groupes à partir de données séquentielles par échantillonnage et optimisation locale , 2019, EGC.

[66]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[67]  Jun Wu,et al.  Significance-based discriminative sequential pattern mining , 2019, Expert Syst. Appl..

[68]  María José del Jesús,et al.  Evolutionary Fuzzy Rule Induction Process for Subgroup Discovery: A Case Study in Marketing , 2007, IEEE Transactions on Fuzzy Systems.

[69]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[70]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[71]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[72]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[73]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[74]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[75]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[76]  Antonio Gomariz,et al.  VMSP: Efficient Vertical Mining of Maximal Sequential Patterns , 2014, Canadian Conference on AI.

[77]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[78]  Mehdi Kaytoue-Uberall,et al.  SeqScout: Using a Bandit Model to Discover Interesting Subgroups in Labeled Sequences , 2019, 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[79]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[80]  Jason Lines,et al.  Classification of time series by shapelet transformation , 2013, Data Mining and Knowledge Discovery.

[81]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[82]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[83]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[84]  Steve Jacobs Raising the Stakes: E-Sports and the Professionalization of Computer Gaming , 2014 .

[85]  Cynthia Rudin,et al.  Sequential event prediction , 2013, Machine Learning.

[86]  Barry Smyth,et al.  Efficient Sequence Regression by Learning Linear Models in All-Subsequence Space , 2017, ECML/PKDD.

[87]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[88]  Wei Luo,et al.  Sqn2Vec: Learning Sequence Representation via Sequential Patterns with a Gap Constraint , 2018, ECML/PKDD.

[89]  C. Charig,et al.  Comparison of treatment of renal calculi by open surgery, percutaneous nephrolithotomy, and extracorporeal shockwave lithotripsy. , 1986, British medical journal.

[90]  Tijl De Bie,et al.  Interesting pattern mining in multi-relational data , 2013, Data Mining and Knowledge Discovery.

[91]  Patrick Schäfer The BOSS is concerned with time series classification in the presence of noise , 2014, Data Mining and Knowledge Discovery.

[92]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[93]  A. J. Feelders,et al.  Different slopes for different folks: mining for exceptional regression models with cook's distance , 2012, KDD.

[94]  Florian Lemmerich,et al.  Fast Subgroup Discovery for Continuous Target Concepts , 2009, ISMIS.

[95]  T. L. Taylor Watch Me Play , 2018 .

[96]  Michèle Sebag,et al.  Feature Selection as a One-Player Game , 2010, ICML.

[97]  Luiz Chaimowicz,et al.  Discovering Combos in Fighting Games with Evolutionary Algorithms , 2016, GECCO.

[98]  Florian Lemmerich,et al.  VIKAMINE - Open-Source Subgroup Discovery, Pattern Mining, and Analytics , 2012, ECML/PKDD.

[99]  Jeffrey Horn,et al.  Handbook of evolutionary computation , 1997 .

[100]  Daniel S. Hirschberg,et al.  Algorithms for the Longest Common Subsequence Problem , 1977, JACM.

[101]  Stefan Wrobel,et al.  Listing closed sets of strongly accessible set systems with applications to data , 2010, LWA.

[102]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[103]  Gerhard Weikum,et al.  Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.

[104]  Chedy Raïssi,et al.  On measuring similarity for sequences of itemsets , 2014, Data Mining and Knowledge Discovery.

[105]  Olivier Teytaud,et al.  Special Issue on Monte Carlo Techniques and Computer Go , 2010, IEEE Trans. Comput. Intell. AI Games.

[106]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[107]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[108]  Luc De Raedt,et al.  Flexible constrained sampling with guarantees for pattern mining , 2016, Data Mining and Knowledge Discovery.

[109]  Johannes Fürnkranz,et al.  From Local to Global Patterns: Evaluation Issues in Rule Learning Algorithms , 2004, Local Pattern Detection.

[110]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[111]  María José del Jesús,et al.  Multiobjective Evolutionary Induction of Subgroup Discovery Fuzzy Rules: A Case Study in Marketing , 2006, ICDM.

[112]  Florian Lemmerich,et al.  pysubgroup: Easy-to-Use Subgroup Discovery in Python , 2018, ECML/PKDD.

[113]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[114]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[115]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[116]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[117]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[118]  Stephen D. Bay,et al.  Detecting Group Differences: Mining Contrast Sets , 2001, Data Mining and Knowledge Discovery.

[119]  Arnaud Giacometti,et al.  20 years of pattern mining: a bibliometric survey , 2014, SKDD.

[120]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[121]  Mario Boley,et al.  Instant Exceptional Model Mining Using Weighted Controlled Pattern Sampling , 2014, IDA.

[122]  Mehdi Kaytoue-Uberall,et al.  A Behavioral Pattern Mining Approach to Model Player Skills in Rocket League , 2020, 2020 IEEE Conference on Games (CoG).

[123]  Kathryn Kasmarik,et al.  Weekly Seasonal Player Population Patterns in Online Games: A Time Series Clustering Approach , 2019, 2019 IEEE Conference on Games (CoG).

[124]  Jean-François Boulicaut,et al.  Optimal Subgroup Discovery in Purely Numerical Data , 2020, PAKDD.

[125]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[126]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[127]  Amedeo Napoli,et al.  Revisiting Numerical Pattern Mining with Formal Concept Analysis , 2011, IJCAI.