Predicting citations from mainstream news, weblogs and discussion forums

The growth in the alternative digital publishing is widening the breadth of scholarly impact beyond the conventional bibliometric community. Thus, research is becoming more reachable both inside and outside of academic institutions and are found to be shared, downloaded and discussed in social media. In this study, we linked the scientific articles found in mainstream news, weblogs and Stack Overflow to the citation database of peer-reviewed literature called Scopus. We then explored how standard graph-based influence metrics can be used to measure the social impact of scientific articles. We also proposed the variant of Katz centrality metrics called EgoMet score to measure the local importance of scientific articles in its ego network. Later we evaluated these computed graph-based influence metrics by predicting absolute citations. Our results of the prediction model describe 34% variance to predict citations from blogs and mainstream news and 44% variance to predict citations from Stack Overflow.

[1]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[2]  Hongyuan Zha,et al.  Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[3]  A. Kulkarni,et al.  Characteristics Associated with Citation Rate of the Medical Literature , 2007, PloS one.

[4]  Gunther Eysenbach,et al.  Can Tweets Predict Citations? Metrics of Social Impact Based on Twitter and Correlation with Traditional Metrics of Scientific Impact , 2011, Journal of medical Internet research.

[5]  Phillip Bonacich,et al.  Eigenvector-like measures of centrality for asymmetric relations , 2001, Soc. Networks.

[6]  Isabell M. Welpe,et al.  I Like, I Cite? Do Facebook Likes Predict the Impact of Scientific Work? , 2015, PloS one.

[7]  Johan Bollen,et al.  How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations , 2012, PloS one.

[8]  Mike Thelwall Journal impact evaluation: a webometric perspective , 2012, Scientometrics.

[9]  Brian Davis,et al.  Towards predicting academic impact from mainstream news and weblogs: A heterogeneous graph based approach , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[10]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[11]  James W. Pennebaker,et al.  Predicting the perceived quality of online mathematics contributions from users' reputations , 2011, CHI.

[12]  Ming Zeng,et al.  Ranking Scientific Articles by Exploiting Citations, Authors, Journals, and Time Information , 2013, AAAI.

[13]  Brent Thoma,et al.  The Altmetric Score: A New Measure for Article-Level Dissemination and Impact. , 2015, Annals of emergency medicine.

[14]  Paul Groth,et al.  The Altmetrics Collection , 2012, PloS one.

[15]  Denis Gillet,et al.  Identifying influential scholars in academic social media platforms , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[16]  Bradley M. Hemminger,et al.  Altmetrics in the wild: Using social media to explore scholarly impact , 2012, ArXiv.

[17]  Daniel Lemire,et al.  Measuring academic influence: Not all citations are equal , 2015, J. Assoc. Inf. Sci. Technol..

[18]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[19]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[20]  Jon M Kleinberg,et al.  Hubs, authorities, and communities , 1999, CSUR.

[21]  Lise Getoor,et al.  FutureRank: Ranking Scientific Articles by Predicting their Future PageRank , 2009, SDM.

[22]  Rodrigo Costas,et al.  Do “altmetrics” correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective , 2014, J. Assoc. Inf. Sci. Technol..

[23]  Bradley M. Hemminger,et al.  Scientometrics 2.0: New metrics of scholarly impact on the social Web , 2010, First Monday.

[24]  Padhraic Smyth,et al.  Algorithms for estimating relative importance in networks , 2003, KDD '03.

[25]  Tim Brody,et al.  Earlier Web usage statistics as predictors of later citation impact: Research Articles , 2006 .

[26]  Vincent Larivière,et al.  Tweeting biomedicine: An analysis of tweets and citations in the biomedical literature , 2013, J. Assoc. Inf. Sci. Technol..

[27]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[28]  R. Wears,et al.  Journal prestige, publication bias, and other characteristics associated with citation of published studies in peer-reviewed journals. , 2002, JAMA.

[29]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[30]  Christian Pieter Hoffmann,et al.  Impact Factor 2.0: Applying Social Network Analysis to Scientific Impact Assessment , 2014, 2014 47th Hawaii International Conference on System Sciences.

[31]  Thomas V Perneger,et al.  Competing interests: None declared. Ethical approval: Ethics committee of Côte d’Ivoire’s Ministry of Public Health and the Institutional Review Board of the US Centers for Disease Control and Prevention , 2004 .

[32]  Stevan Harnad,et al.  Earlier Web Usage Statistics as Predictors of Later Citation Impact , 2005, J. Assoc. Inf. Sci. Technol..

[33]  Nina Belojevic,et al.  Peer Review Personas , 2014 .

[34]  Jure Leskovec,et al.  Discovering value from community activity on focused question answering sites: a case study of stack overflow , 2012, KDD.