Unsupervised Exceptional Attributed Sub-Graph Mining in Urban Data

Geo-located social media provide a wealth of information that describes urban areas based on user descriptions and comments. Such data makes possible to identify meaningful city neighborhoods on the basis of the footprints left by a large and diverse population that uses this type of media. In this paper, we present some methods to exhibit the predominant activities and their associated urban areas to automatically describe a whole city. Based on a suitable attributed graph model, our approach identifies neighborhoods with homogeneous and exceptional characteristics. We introduce the novel problem of exceptional sub-graph mining in attributed graphs and propose a complete algorithm that takes benefits from new upper bounds and pruning properties. We also propose an approach to sample the space of exceptional sub-graphs within a given time-budget. Experiments performed on 10 real datasets are reported and demonstrate the relevancy and the limits of both approaches.

[1]  Mohammad Al Hasan,et al.  FS3: A sampling based method for top-k frequent subgraph mining , 2014, BigData.

[2]  Daniel Paurat,et al.  Direct local pattern sampling by efficient two-step random procedures , 2011, KDD.

[3]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[4]  Matthijs van Leeuwen,et al.  Maximal exceptions with minimal descriptions , 2010, Data Mining and Knowledge Discovery.

[5]  Marc Plantevit,et al.  Mining Graph Topological Patterns: Finding Covariations among Vertex Descriptors , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[7]  A. J. Feelders,et al.  Subgroup Discovery Meets Bayesian Networks -- An Exceptional Model Mining Approach , 2010, 2010 IEEE International Conference on Data Mining.

[8]  Geng Li,et al.  Sampling frequent and minimal boolean patterns: theory and application in classification , 2015, Data Mining and Knowledge Discovery.

[9]  Arnaud Giacometti,et al.  Frequent Pattern Outlier Detection Without Exhaustive Mining , 2016, PAKDD.

[10]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[11]  Mohammad Al Hasan,et al.  Output Space Sampling for Graph Patterns , 2009, Proc. VLDB Endow..

[12]  Thomas Seidl,et al.  Subspace Clustering Meets Dense Subgraph Mining: A Synthesis of Two Paradigms , 2010, 2010 IEEE International Conference on Data Mining.

[13]  Jean-Claude Thill,et al.  Social area analysis, data mining, and GIS , 2008, Comput. Environ. Urban Syst..

[14]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[15]  Aristides Gionis,et al.  Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities , 2015, ICWSM.

[16]  Mario Boley,et al.  Instant Exceptional Model Mining Using Weighted Controlled Pattern Sampling , 2014, IDA.

[17]  Takeaki Uno,et al.  An Efficient Algorithm for Enumerating Pseudo Cliques , 2007, ISAAC.

[18]  Bart Goethals,et al.  Randomly sampling maximal itemsets , 2013, IDEA@KDD.

[19]  Martin Atzmüller,et al.  Description-oriented community detection using exhaustive subgroup discovery , 2016, Inf. Sci..

[20]  A. Knobbe,et al.  Supervised descriptive local pattern mining with complex target concepts , 2016 .

[21]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[22]  Wouter Duivesteijn,et al.  Exceptional Model Mining , 2008, Data Mining and Knowledge Discovery.