Efficient community discovery with user engagement and similarity

In this paper, we investigate the problem of (k,r)-core which intends to find cohesive subgraphs on social networks considering both user engagement and similarity perspectives. In particular, we adopt the popular concept of k-core to guarantee the engagement of the users (vertices) in a group (subgraph) where each vertex in a (k,r)-core connects to at least k other vertices. Meanwhile, we consider the pairwise similarity among users based on their attributes. Efficient algorithms are proposed to enumerate all maximal (k,r)-cores and find the maximum (k,r)-core, where both problems are shown to be NP-hard. Effective pruning techniques substantially reduce the search space of two algorithms. A novel (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},k′\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k'$$\end{document})-core based (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document})-core size upper bound enhances the performance of the maximum (k,r)-core computation. We also devise effective search orders for two algorithms with different search priorities for vertices. Besides, we study the diversified (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document})-core search problem to find l maximal (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document})-cores which cover the most vertices in total. These maximal (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document})-cores are distinctive and informationally rich. An efficient algorithm is proposed with a guaranteed approximation ratio. We design a tight upper bound to prune unpromising partial (k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document},r\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r$$\end{document})-cores. A new search order is designed to speed up the search. Initial candidates with large size are generated to further enhance the pruning power. Comprehensive experiments on real-life data demonstrate that the maximal (k,r)-cores enable us to find interesting cohesive subgraphs, and performance of three mining algorithms is effectively improved by all the proposed techniques.

[1]  Silvio Lattanzi,et al.  Arrival and departure dynamics in social networks , 2013, WSDM '13.

[2]  Wenfei Fan,et al.  On the Complexity of Query Result Diversification , 2013, Proc. VLDB Endow..

[3]  Jean Walrand,et al.  Maximal Cliques in Unit Disk Graphs: Polynomial Approximation , 2006 .

[4]  Hong Cheng,et al.  A model-based approach to attributed graph clustering , 2012, SIGMOD Conference.

[5]  Vangelis Th. Paschos,et al.  Online maximum k-coverage , 2012, Discret. Appl. Math..

[6]  Divesh Srivastava,et al.  On query result diversification , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  Filippo Menczer,et al.  Clustering memes in social media , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8]  Xuemin Lin,et al.  Selecting Stars: The k Most Representative Skyline Operator , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[9]  Lev Muchnik,et al.  Identifying influential spreaders in complex networks , 2010, 1001.5285.

[10]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[11]  Shengchao Ding,et al.  Augmenting collaborative recommender by fusing explicit social relationships , 2009 .

[12]  David S. Johnson,et al.  The Complexity of Near-Optimal Graph Coloring , 1976, J. ACM.

[13]  Jeffrey Xu Yu,et al.  I/O efficient Core Graph Decomposition at web scale , 2015, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[14]  Fan Zhang,et al.  OLAK: An Efficient Algorithm to Prevent Unraveling in Social Networks , 2017, Proc. VLDB Endow..

[15]  Kai Wang,et al.  Efficient Computing of Radius-Bounded k-Cores , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[16]  Huiwen Yu,et al.  Set Coverage Problems in a One-Pass Data Stream , 2013, SDM.

[17]  Fan Zhang,et al.  Efficiently Reinforcing Social Networks over User Engagement and Tie Strength , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[18]  Sreenivas Gollapudi,et al.  Diversifying search results , 2009, WSDM '09.

[19]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[20]  Jia Wang,et al.  Redundancy-aware maximal cliques , 2013, KDD.

[21]  Laks V. S. Lakshmanan,et al.  Truss Decomposition of Probabilistic Graphs: Semantics and Algorithms , 2016, SIGMOD Conference.

[22]  Yi Liu,et al.  Buyers’ purchasing time and herd behavior on deal-of-the-day group-buying websites , 2012, Electron. Mark..

[23]  Lijun Chang,et al.  Diversified top-k clique search , 2015, The VLDB Journal.

[24]  Fan Zhang,et al.  Finding Critical Users for Social Network Engagement: The Collapsed k-Core Problem , 2017, AAAI.

[25]  Xiaofeng Zhu,et al.  Finding dense and connected subgraphs in dual networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[26]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[27]  Jianxin Li,et al.  Maximum Co-located Community Search in Large Scale Social Networks , 2018, Proc. VLDB Endow..

[28]  Xueming Luo,et al.  Group-Buying Deal Popularity , 2014 .

[29]  Reynold Cheng,et al.  Effective Community Search for Large Attributed Graphs , 2016, Proc. VLDB Endow..

[30]  Fan Zhang,et al.  Discovering Strong Communities with User Engagement and Tie Strength , 2018, DASFAA.

[31]  Wolfgang Nejdl,et al.  Incremental diversification for very large sets: a streaming-based approach , 2011, SIGIR '11.

[32]  Charles J. Colbourn,et al.  Unit disk graphs , 1991, Discret. Math..

[33]  Michalis Vazirgiannis,et al.  To stay or not to stay: modeling engagement dynamics in social graphs , 2013, CIKM.

[34]  Priyanka Sharma,et al.  Information Seeking Behavior of Expats in Asia on Facebook Open Groups , 2016 .

[35]  Fan Zhang,et al.  When Engagement Meets Similarity: Efficient (k, r)-Core Computation on Social Networks , 2016, Proc. VLDB Endow..

[36]  Jianliang Xu,et al.  Geo-social group queries with minimum acquaintance constraints , 2017, The VLDB Journal.

[37]  Jennifer Neville,et al.  Attributed graph models: modeling network structure with correlated attributes , 2014, WWW.

[38]  Yunming Ye,et al.  Detecting hot topics from Twitter: A multiview approach , 2014, J. Inf. Sci..

[39]  Wenfei Fan,et al.  On the Complexity of Query Result Diversification , 2014, ACM Trans. Database Syst..

[40]  Cecilia Mascolo,et al.  Keep Your Friends Close and Your Facebook Friends Closer: A Multiplex Network Approach to the Analysis of Offline and Online Social Ties , 2014, ICWSM.

[41]  Daisuke Suzuki,et al.  Faster Enumeration of All Maximal Cliques in Unit Disk Graphs Using Geometric Structure , 2015, IEICE Trans. Inf. Syst..

[42]  FanWenfei,et al.  Diversified top-k graph pattern matching , 2013, VLDB 2013.

[43]  Malik Magdon-Ismail,et al.  Finding Overlapping Communities in Social Networks , 2010, 2010 IEEE Second International Conference on Social Computing.

[44]  Lars Backstrom,et al.  Structural diversity in social contagion , 2012, Proceedings of the National Academy of Sciences.

[45]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[46]  Andreas Hotho,et al.  The social distributional hypothesis: a pragmatic proxy for homophily in online social networks , 2014, Social Network Analysis and Mining.

[47]  David S. Johnson,et al.  Computers and In stractability: A Guide to the Theory of NP-Completeness. W. H Freeman, San Fran , 1979 .

[48]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[49]  Matthew Richardson,et al.  Yes, there is a correlation: - from social networks to personal behavior on the web , 2008, WWW.

[50]  Jeffrey Xu Yu,et al.  Fast Maximal Cliques Enumeration in Sparse Graphs , 2012, Algorithmica.

[51]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[52]  Kai Wang,et al.  Vertex Priority Based Butterfly Counting for Large-scale Bipartite Networks , 2018, Proc. VLDB Endow..

[53]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[54]  Evaggelia Pitoura,et al.  Search result diversification , 2010, SGMD.

[55]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[56]  Jeffrey Xu Yu,et al.  Diversifying Top-K Results , 2012, Proc. VLDB Endow..

[57]  Michael Gertz,et al.  Mining email social networks in Postgres , 2006, MSR '06.

[58]  Tim Roughgarden,et al.  Preventing Unraveling in Social Networks: The Anchored k-Core Problem , 2015, SIAM J. Discret. Math..

[59]  Wenjie Zhang,et al.  Hierarchical Decomposition of Big Graphs , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[60]  Pei Lee CAST : A Context-Aware Story-Teller for Streaming Social Content , 2014 .

[61]  Fan Zhang,et al.  K-Core Maximization: An Edge Addition Approach , 2019, IJCAI.

[62]  Coenraad Bron,et al.  Finding All Cliques of an Undirected Graph (Algorithm 457) , 1973, Commun. ACM.

[63]  James Cheng,et al.  Fast algorithms for maximal clique enumeration with limited memory , 2012, KDD.

[64]  Chin-Laung Lei,et al.  Network game design: hints and implications of player interaction , 2006, NetGames '06.

[65]  Nick Koudas,et al.  Efficient diversity-aware search , 2011, SIGMOD '11.

[66]  Lijun Chang,et al.  Efficient Maximum Clique Computation over Large Sparse Graphs , 2019, KDD.

[67]  M. M. Luo,et al.  The Effect of Social Rewards and Perceived Effectiveness of e-Commerce Institutional Mechanisms on Intention to Group Buying , 2017 .

[68]  Xiaodong Li,et al.  Effective Community Search over Large Spatial Graphs , 2017, Proc. VLDB Endow..