Computational Social Network Analysis of Authority in the Blogosphere

Social Media have gained more and more importance in many areas of our daily lives. One of the first media types in this field were weblogs, which allow everyone to easily publish content online. For weblogs, the reliable algorithmic detection of importance based on social reputation is still an open issue. In this thesis we attempt to measure this authority with algorithms from the field of Social Network Analysis, which have to be scalable, transparent and thoroughly evaluated. Social scientists have identified very specific characteristics for the elite group of influential tob bloggers, which are well represented by the network core/periphery model from Borgatti & Everett. We approximate this model with a scalable algorithm based on the concept of k-cores from Seidman. For evaluation we collect datasets of thousands of top blogs in six different languages, in order to compare and crosscheck the results. These are also compared to random networks, in order to show the significance of the findings. Remaining detection problems are engaged with anomaly detection and network filtering algorithms, which lead to an overall reliable detection process according to our evaluations. In a second step, this thesis transfers these insights to a practical problem. A complete mining and analysis methodology for the monitoring of specific entities in the blogosphere is developed and evaluated. It consists of the search for relevant blog articles, which proves to be highly effective, and the authority measurement of these articles for potential end users in business scenarios, which are validated with respect to soundness. The resulting tool, the “Social Media Miner”, integrates this methodology, combined with text processing methods, in an extensive analysis process and received very good feedback.

[1]  Rebecca Blood,et al.  The Weblog Handbook: Practical Advice On Creating And Maintaining Your Blog , 2002 .

[2]  Andreas Dengel,et al.  Automatic Sentiment Monitoring of Specific Topics in the Blogosphere , 2010, NyNaK.

[3]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  Aaron Delwiche,et al.  Agenda-setting, opinion leadership, and the world of Web logs , 2005, First Monday.

[5]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[6]  Stephan Baumann,et al.  Identifying and Analysing Germany's Top Blogs , 2008, KI.

[7]  Ulrik Brandes,et al.  Efficient generation of large random networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[8]  Ravi Kumar,et al.  On the Bursty Evolution of Blogspace , 2003, WWW '03.

[9]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Christos Faloutsos,et al.  Modeling Blog Dynamics , 2009, ICWSM.

[11]  Matthieu Latapy,et al.  Efficient and simple generation of random simple connected graphs with prescribed degree sequence , 2005, J. Complex Networks.

[12]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[13]  Andreas Dengel,et al.  Mining shared social media links to support clustering of blog articles , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[14]  Jennifer Jie Xu,et al.  Mining communities and their relationships in blogs: A study of online hate groups , 2007, Int. J. Hum. Comput. Stud..

[15]  Ying Zhou,et al.  Community discovery and analysis in blogspace , 2006, WWW '06.

[16]  Ajay Mehra The Development of Social Network Analysis: A Study in the Sociology of Science , 2005 .

[17]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[20]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[21]  Lois Ann Scheidt,et al.  Bridging the gap: a genre analysis of Weblogs , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[22]  Albert-László Barabási,et al.  Linked - how everything is connected to everything else and what it means for business, science, and everyday life , 2003 .

[23]  Mark H. Chignell,et al.  A social hypertext model for finding community in blogs , 2006, HYPERTEXT '06.

[24]  Vladimir Batagelj,et al.  An O(m) Algorithm for Cores Decomposition of Networks , 2003, ArXiv.

[25]  Martin G. Everett,et al.  Models of core/periphery structures , 2000, Soc. Networks.

[26]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[27]  Patrick Doreian,et al.  Defining and locating cores and boundaries of social networks , 1994 .

[28]  Andreas Dengel,et al.  A social network analysis and mining methodology for the monitoring of specific domains in the blogosphere , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[29]  T. Snijders Enumeration and simulation methods for 0–1 matrices with given marginals , 1991 .

[30]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[31]  P. Doreian,et al.  Fixed list versus snowball selection of social networks , 1992 .

[32]  Cameron A. Marlow Audience, structure and authority in the weblog community , 2004 .

[33]  Stephan Baumann,et al.  A Journey to the Core of the Blogosphere , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[34]  Andreas Dengel,et al.  Core/periphery structure versus clustering in international weblogs , 2011, 2011 International Conference on Computational Aspects of Social Networks (CASoN).

[35]  John Scott Social Network Analysis , 1988 .

[36]  Kenneth Baclawski,et al.  New Metrics for Newsblog Credibility , 2007, ICWSM.

[37]  B Skyrms,et al.  A dynamic model of social network formation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Rafael Schirru,et al.  Domain-Specific Identification of Topics and Trends in the Blogosphere , 2010, ICDM.

[39]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[40]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[41]  Clay Shirkey,et al.  Power Laws, Weblogs, and Inequality , 2013 .

[42]  M. Newman,et al.  On the uniform generation of random graphs with prescribed degree sequences , 2003, cond-mat/0312028.

[43]  Andreas Dengel,et al.  Community Identification in International Weblogs , 2010 .

[44]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[45]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[46]  Béla Bollobás,et al.  Random Graphs , 1985 .

[47]  S. Wasserman,et al.  Models and Methods in Social Network Analysis: An Introduction to Random Graphs, Dependence Graphs, and p * , 2005 .

[48]  Bruce A. Reed,et al.  The Size of the Giant Component of a Random Graph with a Given Degree Sequence , 1998, Combinatorics, Probability and Computing.

[49]  Eytan Adar,et al.  Implicit Structure and the Dynamics of Blogspace , 2004 .

[50]  Peter Wortmann Topic-Based Blog Article Search for Trend Detection , 2009 .

[51]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[52]  Robin I. M. Dunbar Coevolution of neocortical size, group size and language in humans , 1993, Behavioral and Brain Sciences.

[53]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[54]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[55]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[56]  Nick Koudas,et al.  Searching the Blogosphere , 2007, WebDB.