Using social network analysis to enhance information retrieval systems

It is an ongoing trend that people increasingly reveal very personal information on social network sites in particular and in the World Wide Web in general. As this information becomes more and more publicly available from these various social network sites and the web in general, the social relationships between people can be identified. This in turn enables the automatic extraction of social networks. This trend is furthermore driven and enforced by recent initiatives such as facebook's connect, MySpace's data availability and Google's FriendConnect by making their social network data available to anyone. Furthermore the current development of the World Wide Web, termed as "Web 2.0" by O'Reilly, enables increasingly more people to publish information without profound technical knowledge. Blogs for example have gained a lot of attention in recent years. The whole blogosphere including more than 70 million blogs forms a reasonable body of information and knowledge. Additionally, hypertext links made between blogs have been described as conversation, affiliation, or readership, implying a form of implicit social structure. That means that the publicly available information is increasingly annotated with author information which allows the extraction of social networks, too. These recent developments described above, together with increasing computing power and an increased amount of freely available scientific publication data in diverse databases, has led to a dramatic growth in interest for social network analysis (SNA) and in network analysis in general. However, there is little attention about the application of SNA for use in information retrieval systems. Recent studies suggest that the social network of a person has a significant impact on his/her information acquisition. Additionally SNA offers methods that enable the identification of important persons within social networks, who could have a significant influence on the importance of certain information. Therefore the paper proposes the application of available social network data in the context of information retrieval systems. An outline of the research design for the exploration of meaningful sources for social network extraction and the impact of meaningful SNA methods and measures in the context of information retrieval systems is presented. An evaluation of these methods and measures is conducted on ScientificCommons.org, a search platform for open access publications with more than 21 million publications and 8.5 million extracted authors and their co-authorship network. The contribution of this paper is based on an analysis of online information sources in terms of their usability for the extraction of social networks and a research framework for the analysis and application of social network methods to information retrieval systems. The research framework was applied to the co-authorship network of scientific publications. The co-authorship network was used to compute different centrality measures of the authors, which then in turn have been used to refine the relevance ranking of publications within information retrieval systems. The performance of the different rankings based on the different centrality measures has been evaluated by the measurement of the click-through performance in the search results.

[1]  Piotr Indyk,et al.  Fast estimation of diameter and shortest paths (without matrix multiplication) , 1996, SODA '96.

[2]  Danah Boyd,et al.  Friendster and publicly articulated social networking , 2004, CHI EA '04.

[3]  C. Lee Giles,et al.  Two supervised learning approaches for name disambiguation in author citations , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[4]  Susan Joe Self-disclosure in Computer-Mediated Communication , 2006 .

[5]  Inna Kouper,et al.  Language Networks on LiveJournal , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[6]  J. Walther Anticipated Ongoing Interaction Versus Channel Effects on Relational Communication in Computer-Mediated Interaction , 1994 .

[7]  W. Glänzel,et al.  Analysing Scientific Networks Through Co-Authorship , 2004 .

[8]  Hildrun Kretschmer,et al.  The evolution of a citation network topology:The development of the journal Scientometrics , 2006 .

[9]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..

[10]  Caroline Haythornthwaite,et al.  Studying Online Social Networks , 2006, J. Comput. Mediat. Commun..

[11]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[12]  P. Kollock,et al.  Virtual communities as communities , 2002 .

[13]  Mike Thelwall,et al.  Hyperlink Analyses of the World Wide Web: A Review , 2006, J. Comput. Mediat. Commun..

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Ronald E. Rice,et al.  Network Analysis and Computer-Mediated Communication Systems , 1994 .

[17]  Bart Selman,et al.  The Hidden Web , 1997, AI Mag..

[18]  Barry Wellman,et al.  Does citation reflect social structure?: Longitudinal evidence from the Globenet interdisciplinary research group , 2004, J. Assoc. Inf. Sci. Technol..

[19]  Li Ding,et al.  Social Networking on the Semantic Web , 2005 .

[20]  L. Freeman THE IMPACT OF COMPUTER BASED COMMUNICATION ON THE SOCIAL STRUCTURE OF AN EMERGING SCIENTIFIC SPECIALTY , 1984 .

[21]  David Eppstein,et al.  Fast approximation of centrality , 2000, SODA '01.

[22]  Ronald Rousseau,et al.  Social network analysis: a powerful strategy, also for the information sciences , 2002, J. Inf. Sci..

[23]  Andrew Parker,et al.  Beyond answers: dimensions of the advice network , 2001, Soc. Networks.

[24]  Tamás Nepusz,et al.  Measuring tie-strength in virtual social networks , 2006 .

[25]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[26]  Ronald S. Burt,et al.  Relation contents in multiple networks , 1985 .

[27]  B. Wellman Physical Place and Cyberplace: The Rise of Personalized Networking , 2001 .

[28]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[29]  David Shallcross,et al.  Practical Issues and Algorithms for Analyzing Terrorist Networks 1 , 2002 .

[30]  A. Vázquez Statistics of citation networks , 2001, cond-mat/0105031.

[31]  Marie-Claude Boily,et al.  Dynamical systems to define centrality in social networks , 2000, Soc. Networks.

[32]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[33]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[34]  J. Walther Relational Aspects of Computer-Mediated Communication: Experimental Observations over Time , 1995 .

[35]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations (Lecture Notes in Computer Science) , 2005 .

[36]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[37]  Tim O'Reilly,et al.  What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software , 2007 .

[38]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[39]  Inna Kouper,et al.  Conversations in the Blogosphere: An Analysis "From the Bottom Up" , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[40]  Frada Burstein,et al.  Chapter 8 – System development in information systems research , 2002 .

[41]  Ulrik Brandes,et al.  Network Analysis: Methodological Foundations , 2010 .

[42]  Mark Craven,et al.  Hierarchical Hidden Markov Models for Information Extraction , 2003, IJCAI.

[43]  Hangwoo Lee,et al.  Privacy, Publicity, and Accountability of Self‐Presentation in an On‐Line Discussion Group* , 2006 .

[44]  Atsuhiro Takasu,et al.  Bibliographic attribute extraction from erroneous references based on a statistical model , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[45]  Hans-Dieter Daniel,et al.  Data sources for performing citation analysis: an overview , 2008, J. Documentation.

[46]  Ravi Kumar,et al.  Structure and evolution of blogspace , 2004, CACM.

[47]  A. Joinson Self‐disclosure in computer‐mediated communication: The role of self‐awareness and visual anonymity , 2001 .

[48]  Charles Oppenheim,et al.  How socially connected are citers to those that they cite? , 2007, J. Documentation.

[49]  Lada A. Adamic,et al.  Friends and neighbors on the Web , 2003, Soc. Networks.

[50]  S. Borgatti,et al.  The Network Paradigm in Organizational Research: A Review and Typology , 2003 .

[51]  Andrea Spoto,et al.  Social Network Analysis: A brief theoretical review and further perspectives in the study of Information Technology , 2006, PsychNology J..

[52]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[53]  Judith Donath,et al.  Public Displays of Connection , 2004 .

[54]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[55]  David De Roure,et al.  Co-Presence Communities: Using Pervasive Computing to Support Weak Social Networks , 2006, 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE'06).

[56]  Martin Everett,et al.  Ego network betweenness , 2005, Soc. Networks.

[57]  Michele H. Jackson Assessing the Structure of Communication on the World Wide Web , 2006, J. Comput. Mediat. Commun..

[58]  Pertti Järvinen,et al.  Research Questions Guiding Selection of an Appropriate Research Method , 2000, ECIS.