Mining diversity on social media networks

The fast development of multimedia technology and increasing availability of network bandwidth has given rise to an abundance of network data as a result of all the ever-booming social media and social websites in recent years, e.g., Flickr, Youtube, MySpace, Facebook, etc. Social network analysis has therefore become a critical problem attracting enthusiasm from both academia and industry. However, an important measure that captures a participant’s diversity in the network has been largely neglected in previous studies. Namely, diversity characterizes how diverse a given node connects with its peers. In this paper, we give a comprehensive study of this concept. We first lay out two criteria that capture the semantic meaning of diversity, and then propose a compliant definition which is simple enough to embed the idea. Based on the approach, we can measure not only a user’s sociality and interest diversity but also a social media’s user diversity. An efficient top-k diversity ranking algorithm is developed for computation on dynamic networks. Experiments on both synthetic and real social media datasets give interesting results, where individual nodes identified with high diversities are intuitive.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[4]  Jian Pei,et al.  On mining cross-graph quasi-cliques , 2005, KDD '05.

[5]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[6]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[7]  U. Brandes A faster algorithm for betweenness centrality , 2001 .

[8]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[9]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[10]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[11]  Yizhou Sun,et al.  RankClus: integrating clustering with ranking for heterogeneous information network analysis , 2009, EDBT '09.

[12]  R. Putnam Bowling Alone: America's Declining Social Capital , 1995, The City Reader.

[13]  Jon M. Kleinberg,et al.  The structure of information pathways in a social communication network , 2008, KDD.

[14]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[15]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[16]  Ravi Kumar,et al.  Discovering Large Dense Subgraphs in Massive Graphs , 2005, VLDB.

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Aidong Zhang,et al.  Bridging centrality: graph mining from element level to group level , 2008, KDD.

[19]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[20]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[21]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[22]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  FaloutsosMichalis,et al.  On power-law relationships of the Internet topology , 1999 .

[24]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[25]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.