Digger

People participate in multiple online social networks, e.g., Facebook, Twitter, and Linkedin, and these social networks with heterogeneous social content and user relationship are named as heterogeneous social networks. Group structure widely exists in heterogeneous social networks, which reveals the evolution of human cooperation. Detecting similar groups in heterogeneous networks has a great significance for many applications, such as recommendation system and spammer detection, using the wealth of group information. Although promising, this novel problem encounters a variety of technical challenges, including incomplete data, high time complexity, and ground truth. To address the research gap and technical challenges, we take advantage of a ratio-cut optimization function to model this novel problem by the linear mixed-effects method and graph spectral theory. Based on this model, we propose an efficient algorithm called Digger to detect the similar groups in the large graphs. Digger consists of three steps, including measuring user similarity, construct a matching graph, and detecting similar groups. We adopt several strategies to lower the computational cost and detail the basis of labeling the ground truth. We evaluate the effectiveness and efficiency of our algorithm on five different types of online social networks. The extensive experiments show that our method achieves 0.693, 0.783, and 0.735 in precision, recall, and F1-measure, which significantly surpass the state-of-arts by 24.4%, 15.3%, and 20.7%, respectively. The results demonstrate that our proposal can detect similar groups in heterogeneous networks effectively.

[1]  Wu Liu,et al.  A common subgraph correspondence mining framework for map search services , 2017, Multimedia Tools and Applications.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[4]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[5]  Shuicheng Yan,et al.  Common visual pattern discovery via spatially coherent correspondences , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Shilpa Chakravartula,et al.  Complex Networks: Structure and Dynamics , 2014 .

[7]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[8]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[9]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[11]  C. McCulloch,et al.  Generalized Linear Mixed Models , 2005 .

[12]  Wei Chen,et al.  A game-theoretic framework to identify overlapping communities in social networks , 2010, Data Mining and Knowledge Discovery.

[13]  Malik Magdon-Ismail,et al.  Defining and Discovering Communities in Social Networks , 2012 .

[14]  Philip S. Yu,et al.  Collaborative Co-clustering across Multiple Social Media , 2016, 2016 17th IEEE International Conference on Mobile Data Management (MDM).

[15]  James Demmel,et al.  Fast linear algebra is stable , 2006, Numerische Mathematik.

[16]  Philip S. Yu,et al.  Top-k Similarity Join in Heterogeneous Information Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Xiaoming Liu,et al.  MIRACLE: A multiple independent random walks community parallel detection algorithm for big graphs , 2016, J. Netw. Comput. Appl..

[19]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[20]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..

[21]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Philip S. Yu,et al.  Inferring social roles and statuses in social networks , 2013, KDD.

[23]  Philip S. Yu,et al.  HeteRecom: a semantic-based recommendation system in heterogeneous networks , 2012, KDD.

[24]  Chih-Chien Wang,et al.  Toward understanding the cliques of opinion spammers with social network analysis , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[25]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Claude Castelluccia,et al.  How Unique and Traceable Are Usernames? , 2011, PETS.

[27]  David S. Johnson,et al.  The Rectilinear Steiner Tree Problem is NP Complete , 1977, SIAM Journal of Applied Mathematics.

[28]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[29]  Martin Atzmüller,et al.  Description-oriented community detection using exhaustive subgroup discovery , 2016, Inf. Sci..

[30]  Xiaoming Liu,et al.  Detecting community structure for undirected big graphs based on random walks , 2014, WWW.

[31]  Cristopher Moore,et al.  Community detection in networks with unequal groups , 2015, Physical review. E.

[32]  Nicholas A. Christakis,et al.  Cooperative behavior cascades in human social networks , 2009, Proceedings of the National Academy of Sciences.

[33]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[34]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[35]  Ting Yu,et al.  Detecting Opinion Spammer Groups Through Community Discovery and Sentiment Analysis , 2015, DBSec.

[36]  Xiaohong Guan,et al.  A feasible graph partition framework for parallel computing of big graph , 2017, Knowl. Based Syst..

[37]  Philip S. Yu,et al.  COSNET: Connecting Heterogeneous Social Networks with Local and Global Consistency , 2015, KDD.

[38]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.