Contributive Social Capital Extraction From Different Types of Online Data Sources

It is a recurring problem of online communication that the properties of unknown people are hard to assess. This may lead to various issues such as the spread of `fake news' from untrustworthy sources. In sociology the sum of (social) resources available to a person through their social network is often described as social capital. In this article, we look at social capital from a different angle. Instead of evaluating the advantage that people have because of their membership in a certain group, we investigate various ways to infer the social capital a person adds or may add to the network, their contributive social capital (CSC). As there is no consensus in the literature on what the social capital of a person exactly consists of, we look at various related properties: expertise, reputation, trustworthiness, and influence. The analysis of these features is investigated for five different sources of online data: microblogging (e.g., Twitter), social networking platforms (e.g., Facebook), direct communication (e.g., email), scientometrics, and threaded discussion boards (e.g., Reddit). In each field we discuss recent publications and put a focus on the data sources used, the algorithms implemented, and the performance evaluation. The findings are compared and set in context to contributive social capital extraction. The analysis algorithms are based on individual features (e.g., followers on Twitter), ratios thereof, or a person's centrality measures (e.g., PageRank). The machine learning approaches, such as straightforward classifiers (e.g., support vector machines) use ground truths that are connected to social capital. The discussion of these methods is intended to facilitate research on the topic by identifying relevant data sources and the best suited algorithms, and by providing tested methods for the evaluation of findings.

[1]  Eric Gilbert,et al.  Widespread underprovision on Reddit , 2013, CSCW.

[2]  Shahizan Hassan Identifying criteria for measuring influence of social media , 2013 .

[3]  Diego Gambetta Can We Trust Trust , 2000 .

[4]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[5]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[6]  Alessandro Bozzon,et al.  Choosing the right crowd: expert finding in social networks , 2013, EDBT '13.

[7]  Denis Gillet,et al.  Identifying influential scholars in academic social media platforms , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Mohamed Bouguessa,et al.  Identifying Authorities in Online Communities , 2015, ACM Trans. Intell. Syst. Technol..

[10]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[11]  Isabel Anger,et al.  Measuring influence on Twitter , 2011, i-KNOW '11.

[12]  Kelly Bergstrom,et al.  "Don't feed the troll": Shutting down debate about community expectations on Reddit.com , 2011, First Monday.

[13]  Andrew N. Smith,et al.  How Does Brand-related User-generated Content Differ across YouTube, Facebook, and Twitter? , 2012 .

[14]  Maarten de Rijke,et al.  Finding experts and their eetails in e-mail corpora , 2006, WWW '06.

[15]  Ashish Goel,et al.  Avoiding ballot stuffing in eBay-like reputation systems , 2005, P2PECON '05.

[16]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[17]  Nemanja Spasojevic,et al.  Klout score: Measuring influence across multiple social networks , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[18]  N. Lin Social Capital: A Theory of Social Structure and Action , 2001 .

[19]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[20]  Paul P. Maglio,et al.  Expertise identification using email communications , 2003, CIKM '03.

[21]  Jie Tang,et al.  Learning to Diversify Expert Finding with Subtopics , 2012, PAKDD.

[22]  Jiawei Han,et al.  An exploration of discussion threads in social news sites: A case study of the Reddit community , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[23]  N. L. Chervany,et al.  THE MEANINGS OF TRUST , 2000 .

[24]  Cai-Nicolas Ziegler,et al.  On Propagating Interpersonal Trust in Social Networks , 2009, Computing with Social Trust.

[25]  Raquel Recuero,et al.  How Does Social Capital Affect Retweets? , 2011, ICWSM.

[26]  Annika Richterich,et al.  ‘Karma, Precious Karma!’ : 'Karmawhoring' on Reddit and the Front Page’s Econometrisation , 2014 .

[27]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[28]  Jennifer Golbeck,et al.  Computing with Social Trust , 2008, Human-Computer Interaction Series.

[29]  Jie Tang,et al.  SOCIAL NETWORK DATA ANALYTICS SOCIAL NETWORK DATA ANALYTICS , 2010 .

[30]  John O'Donovan,et al.  Capturing Trust in Social Web Applications , 2009, Computing with Social Trust.

[31]  Akbar Ghaffarpour Rahbar,et al.  PowerTrust: A Robust and Scalable Reputation System for Trusted Peer-to-Peer Computing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[32]  Kerk F. Kee,et al.  Is There Social Capital in a Social Network Site?: Facebook Use and College Students’ Life Satisfaction, Trust, and Participation 1 , 2009 .

[33]  Stefan Richter,et al.  Centrality Indices , 2004, Network Analysis.

[34]  Robert Jäschke,et al.  Identifying and analyzing researchers on twitter , 2014, WebSci '14.

[35]  Kathleen M. Carley,et al.  Trends in science networks: understanding structures and statistics of scientific networks , 2012, Social Network Analysis and Mining.

[36]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[37]  R. Wigand,et al.  Measuring social capital through network analysis and its influence on individual performance , 2014 .

[38]  Audun Jøsang,et al.  A survey of trust and reputation systems for online service provision , 2007, Decis. Support Syst..

[39]  Shlomo Hershkop,et al.  Automated social hierarchy detection through email network analysis , 2007, WebKDD/SNA-KDD '07.

[40]  S. S. Phulari,et al.  Understanding Formulation of Social Capital in Online Social Network Sites (SNS) , 2010, ArXiv.

[41]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[42]  R. Goldsmith The Influentials: One American in Ten Tells the Other Nine How to Vote, Where to Eat, and What to Buy , 2004 .

[43]  Brian D. Davison,et al.  Topic-driven multi-type citation network analysis , 2010, RIAO.

[44]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[45]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[46]  Mark S. Ackerman,et al.  Searching for expertise in social networks: a simulation of potential strategies , 2005, GROUP.

[47]  Cristina Nita-Rotaru,et al.  A survey of attack and defense techniques for reputation systems , 2009, CSUR.

[48]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[49]  Elad Yom-Tov,et al.  On the relationship between novelty and popularity of user-generated content , 2010, CIKM 2010.

[50]  Matthias Hofer,et al.  Perceived bridging and bonding social capital on Twitter: Differentiating between followers and followees , 2013, Comput. Hum. Behav..

[51]  Paul Resnick,et al.  Follow the reader: filtering comments on slashdot , 2007, CHI.

[52]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[53]  R. Putnam Bowling Alone: America's Declining Social Capital , 1995, The City Reader.

[54]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[55]  Robert E. Kraut,et al.  Social capital on facebook: differentiating uses and users , 2011, CHI.