Mining Subgroups with Exceptional Transition Behavior

We present a new method for detecting interpretable subgroups with exceptional transition behavior in sequential data. Identifying such patterns has many potential applications, e.g., for studying human mobility or analyzing the behavior of internet users. To tackle this task, we employ exceptional model mining, which is a general approach for identifying interpretable data subsets that exhibit unusual interactions between a set of target attributes with respect to a certain model class. Although exceptional model mining provides a well-suited framework for our problem, previously investigated model classes cannot capture transition behavior. To that end, we introduce first-order Markov chains as a novel model class for exceptional model mining and present a new interestingness measure that quantifies the exceptionality of transition subgroups. The measure compares the distance between the Markov transition matrix of a subgroup and the respective matrix of the entire data with the distance of random dataset samples. In addition, our method can be adapted to find subgroups that match or contradict given transition hypotheses. We demonstrate that our method is consistently able to recover subgroups with exceptional transition models from synthetic data and illustrate its potential in two application examples. Our work is relevant for researchers and practitioners interested in detecting exceptional transition behavior in sequential data.

[1]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[2]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[3]  B. E. Eckbo,et al.  Appendix , 1826, Epilepsy Research.

[4]  Hannu Toivonen,et al.  Discovering statistically non-redundant subgroups , 2014, Knowl. Based Syst..

[5]  Geoffrey I. Webb Layered critical values: a powerful direct-adjustment approach to discovering significant patterns , 2008, Machine Learning.

[6]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[7]  A. Hotho,et al.  HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web , 2014, WWW.

[8]  Aristides Gionis,et al.  Assessing data mining results via swap randomization , 2007, TKDD.

[9]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.

[10]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Cong Yu,et al.  Automatic construction of travel itineraries using social breadcrumbs , 2010, HT '10.

[12]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[13]  J. Borge-Holthoefer,et al.  Discrete-time Markov chain approach to contact-based disease spreading in complex networks , 2009, 0907.1313.

[14]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[17]  Rui Li,et al.  Efficient redundancy reduced subgroup discovery via quadratic programming , 2012, Journal of Intelligent Information Systems.

[18]  Sébastien Gambs,et al.  Show me how you move and I will tell you who you are , 2010, SPRINGL '10.

[19]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[20]  C. S. Poulsen Mixed Markov and latent Markov modelling applied to brand choice behaviour , 1990 .

[21]  Jie Li,et al.  Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.

[22]  E. R. Swanson,et al.  MARKOV CHAINS: BASIC CONCEPTS AND SUGGESTED USES IN AGRICULTURAL ECONOMICS , 1962 .

[23]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[24]  Wouter Duivesteijn,et al.  Exceptionally monotone models—the rank correlation model class for Exceptional Model Mining , 2017, 2015 IEEE International Conference on Data Mining.

[25]  Luc De Raedt,et al.  Cluster-grouping: from subgroup discovery to clustering , 2004, Machine Learning.

[26]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[27]  Andreas Hotho,et al.  VizTrails: An Information Visualization Tool for Exploring Geographic Movement Trajectories , 2015, HT.

[28]  Florian Lemmerich,et al.  Generic Pattern Trees for Exhaustive Exceptional Model Mining , 2012, ECML/PKDD.

[29]  Andreas Hotho,et al.  Photowalking the City: Comparing Hypotheses About Urban Photo Trails on Flickr , 2015, SocInfo.

[30]  Markus Strohmaier,et al.  Understanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects , 2015, International Semantic Web Conference.

[31]  Martin Atzmüller,et al.  Subgroup discovery , 2005, Künstliche Intell..

[32]  Arno J. Knobbe,et al.  Diverse subgroup set discovery , 2012, Data Mining and Knowledge Discovery.

[33]  Peter Pirolli,et al.  Distributions of surfers' paths through the World Wide Web: Empirical characterizations , 1999, World Wide Web.

[34]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[35]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[36]  A. J. Feelders,et al.  Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[37]  Florian Lemmerich,et al.  VIKAMINE - Open-Source Subgroup Discovery, Pattern Mining, and Analytics , 2012, ECML/PKDD.

[38]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[39]  Florian Lemmerich,et al.  Exploratory pattern mining on social media using geo-references and social tagging information , 2013, Int. J. Web Sci..

[40]  K. Gabriel,et al.  A Markov chain model for daily rainfall occurrence at Tel Aviv , 1962 .

[41]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[42]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[43]  Denis Helic,et al.  Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order , 2014, PloS one.

[44]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[45]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[46]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[47]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[48]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[49]  Mohammed J. Zaki,et al.  Mining features for sequence classification , 1999, KDD '99.

[50]  Thomas Heckelei,et al.  Differences of farm structural change across European regions , 2012 .

[51]  A. J. Feelders,et al.  Different slopes for different folks: mining for exceptional regression models with cook's distance , 2012, KDD.

[52]  Matthijs van Leeuwen,et al.  Maximal exceptions with minimal descriptions , 2010, Data Mining and Knowledge Discovery.

[53]  Wouter Duivesteijn,et al.  Exploiting False Discoveries -- Statistical Validation of Patterns and Quality Measures in Subgroup Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.