When is a Network a Network?: Multi-Order Graphical Model Selection in Pathways and Temporal Networks

We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in the Web, travel patterns in transportation systems, information cascades in social networks, biological pathways, or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: When is a network abstraction of sequential data justified?Addressing this open question, we propose a framework that combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms baseline Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models that capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.

[1]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[2]  de Ng Dick Bruijn A combinatorial problem , 1946 .

[3]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[4]  H. Tong Determination of the order of a Markov chain by Akaike's information criterion , 1975, Journal of Applied Probability.

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  R. Katz On Some Criteria for Estimating the Order of a Markov Chain , 1981 .

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[10]  Amit Kumar,et al.  Connectivity and inference problems for temporal networks , 2000, STOC '00.

[11]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.

[12]  Christopher C. Strelioff,et al.  Inferring Markov chains: Bayesian estimation, model comparison, entropy rate, and out-of-class modeling. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[13]  C. T. Butts,et al.  Revisiting the Foundations of Network Analysis , 2009, Science.

[14]  Ciro Cattuto,et al.  What's in a crowd? Analysis of face-to-face behavioral networks , 2010, Journal of theoretical biology.

[15]  Przemyslaw Kazienko,et al.  Matching Organizational Structure and Social Network Extracted from Email Communication , 2011, BIS.

[16]  Katharina A. Zweig,et al.  Good versus optimal: Why network analytic methods need more systematic evaluation , 2011, Central European Journal of Computer Science.

[17]  Ravi Kumar,et al.  Are web users really Markovian? , 2012, WWW.

[18]  Jure Leskovec,et al.  Human wayfinding in information networks , 2012, WWW.

[19]  Igor M. Sokolov,et al.  Unfolding accessibility provides a macroscopic approach to temporal networks , 2012, Physical review letters.

[20]  Ingo Scholtes,et al.  Betweenness Preference: Quantifying Correlations in the Topological Dynamics of Temporal Networks , 2012, Physical review letters.

[21]  A. Barrat,et al.  Estimating Potential Infection Transmission Routes in Hospital Wards Using Wearable Proximity Sensors , 2013, PloS one.

[22]  Ingo Scholtes,et al.  Causality-driven slow-down and speed-up of diffusion in non-Markovian temporal networks , 2013, Nature Communications.

[23]  Denis Helic,et al.  Detecting Memory and Structure in Human Navigation Patterns Using Markov Chain Models of Varying Order , 2014, PloS one.

[24]  Martin Rosvall,et al.  Memory in network flows and its effects on spreading dynamics and community detection , 2013, Nature Communications.

[25]  Petter Holme,et al.  Modern temporal network theory: a colloquium , 2015, The European Physical Journal B.

[26]  Ingo Scholtes,et al.  Higher-order aggregate networks in the analysis of temporal networks: path structures and centralities , 2015, The European Physical Journal B.

[27]  Christos Faloutsos,et al.  RSC: Mining and Modeling Temporal Activity in Social Media , 2015, KDD.

[28]  Yisong Yue,et al.  A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.

[29]  Julie Fournet,et al.  Data on face-to-face contacts in an office building suggest a low-cost vaccination strategy based on community linkers , 2014, Network Science.

[30]  Hui Xiong,et al.  Temporal Skeletonization on Sequential Data: Patterns, Categorization, and Visualization , 2016, IEEE Trans. Knowl. Data Eng..

[31]  Hongyang Zhang,et al.  Approximate Personalized PageRank on Dynamic Graphs , 2016, KDD.

[32]  Florence March,et al.  2016 , 2016, Affair of the Heart.

[33]  Renaud Lambiotte,et al.  Using higher-order Markov models to reveal flow-based communities in networks , 2016, Scientific Reports.

[34]  Nitesh V. Chawla,et al.  Representing higher-order dependencies in networks , 2015, Science Advances.

[35]  Yi Yang,et al.  Diversified Temporal Subgraph Pattern Mining , 2016, KDD.

[36]  Ravi Kumar,et al.  Linear Additive Markov Processes , 2017, WWW.

[37]  Martin Rosvall,et al.  Modelling sequences and temporal networks with dynamic community structures , 2015, Nature Communications.

[38]  Yan Zhang,et al.  Controllability of temporal networks: An analysis using higher-order networks , 2017, ArXiv.

[39]  Tao Wu,et al.  Retrospective Higher-Order Markov Processes for User Trails , 2017, KDD.