CatchTartan: Representing and Summarizing Dynamic Multicontextual Behaviors

Representing and summarizing human behaviors with rich contexts facilitates behavioral sciences and user-oriented services. Traditional behavioral modeling represents a behavior as a tuple in which each element is one contextual factor of one type, and the tensor-based summaries look for high-order dense blocks by clustering the values (including timestamps) in each dimension. However, the human behaviors are multicontextual and dynamic: (1) each behavior takes place within multiple contexts in a few dimensions, which requires the representation to enable non-value and set-values for each dimension; (2) many behavior collections, such as tweets or papers, evolve over time. In this paper, we represent the behavioral data as a two-level matrix (temporal-behaviors by dimensional-values) and propose a novel representation for behavioral summary called Tartan that includes a set of dimensions, the values in each dimension, a list of consecutive time slices and the behaviors in each slice. We further develop a propagation method CatchTartan to catch the dynamic multicontextual patterns from the temporal multidimensional data in a principled and scalable way: it determines the meaningfulness of updating every element in the Tartan by minimizing the encoding cost in a compression manner. CatchTartan outperforms the baselines on both the accuracy and speed. We apply CatchTartan to four Twitter datasets up to 10 million tweets and the DBLP data, providing comprehensive summaries for the events, human life and scientific development.

[1]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[2]  Jilles Vreeken,et al.  Krimp: mining itemsets that compress , 2011, Data Mining and Knowledge Discovery.

[3]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[4]  Matthew Huras,et al.  Multi-dimensional clustering: a new data layout scheme in DB2 , 2003, SIGMOD '03.

[5]  Fei Wang,et al.  FEMA: flexible evolutionary multi-faceted analysis for dynamic behavioral pattern discovery , 2014, KDD.

[6]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[7]  Fei Wang,et al.  Community discovery using nonnegative matrix factorization , 2011, Data Mining and Knowledge Discovery.

[8]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[9]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[10]  Christos Faloutsos,et al.  EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[11]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[12]  Jian Pei,et al.  Multidimensional mining of large-scale search logs: a topic-concept cube approach , 2011, WSDM '11.

[13]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[14]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Chengqi Zhang,et al.  Interval-Based Similarity for Classifying Conserved RNA Secondary Structures , 2016, IEEE Intelligent Systems.

[16]  Fang Zhou,et al.  Compression of weighted graphs , 2011, KDD.

[17]  Jiawei Han,et al.  Graph cube: on warehousing and OLAP multidimensional networks , 2011, SIGMOD '11.

[18]  Danai Koutra,et al.  VOG: Summarizing and Understanding Large Graphs , 2014, SDM.

[19]  Shiqiang Yang,et al.  A Multiscale Survival Process for Modeling Human Activity Patterns , 2016, PloS one.

[20]  Ananthram Swami,et al.  Com2: Fast Automatic Discovery of Temporal ('Comet') Communities , 2014, PAKDD.

[21]  Christos Faloutsos,et al.  On data mining, compression, and Kolmogorov complexity , 2007, Data Mining and Knowledge Discovery.

[22]  David S. Johnson,et al.  Compressing Large Boolean Matrices using Reordering Techniques , 2004, VLDB.

[23]  Christos Faloutsos,et al.  Beyond 'Caveman Communities': Hubs and Spokes for Graph Compression and Mining , 2011, 2011 IEEE 11th International Conference on Data Mining.

[24]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[25]  Dimitrios Gunopulos,et al.  Automatic Subspace Clustering of High Dimensional Data , 2005, Data Mining and Knowledge Discovery.

[26]  Heng Ji,et al.  EventCube: multi-dimensional search and mining of structured and text data , 2013, KDD.

[27]  Christos Faloutsos,et al.  Suspicious Behavior Detection: Current Trends and Future Directions , 2016, IEEE Intelligent Systems.

[28]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[29]  Fei Wang,et al.  Scalable Recommendation with Social Contextual Information , 2014, IEEE Transactions on Knowledge and Data Engineering.

[30]  Danai Koutra,et al.  TimeCrunch: Interpretable Dynamic Graph Summarization , 2015, KDD.

[31]  Christos Faloutsos,et al.  Clustering very large multi-dimensional datasets with MapReduce , 2011, KDD.

[32]  Philip S. Yu,et al.  GraphScope: parameter-free mining of large time-evolving graphs , 2007, KDD '07.

[33]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[34]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[35]  Paul M. B. Vitányi,et al.  Clustering by compression , 2003, IEEE Transactions on Information Theory.