Deconstructing the collaborative impact: Article and author characteristics that influence citation count

It is well known that collaborative papers tend to receive more citations than solo-authored papers. Here we try to identify the subtle factors of this collaborative effect by analyzing metadata and citation counts for co-authored papers in the biomedical domain, after accounting for attributes known to be strong predictors of citation count. Article-level metadata were gathered from 98,000 PubMed article records categorized with the term breast neoplasm, a topic offering longevity and relevance across biomedical subdisciplines, and yielding a relatively large sample size. Open access citation data was obtained from PubMed Central (PMC). Author-level attributes were encoded from disambiguated author name data in PubMed and appended as article-level attributes of collaborations. A logistic regression model was built to assess the relative weights of these factors as influences on citation counts. As expected, the journal and language of the paper were the strongest predictors. The significance of the number of authors diminished after accounting for other attributes. Some of the more subtle predictors included the group's highest h-index, which was positively correlated, while the diversity of author h-indices, minimum professional age, and author's total unique collaborators were negatively correlated. These observations indicate that smaller collaborations composed of early superstars -- young, rapidly successful researchers with relatively high and similar h-indices -- may be at least as influential in biomedical research as larger collaborations with different demographics. While minimum h-index was important, the first author's h-index was insignificant, underscoring the importance of the middle authors' publishing history. The gender diversity outcomes suggest that mixed groups may be ideal, and further research in this area is indicated.

[1]  Harry Eugene Stanley,et al.  Persistence and uncertainty in the academic career , 2012, Proceedings of the National Academy of Sciences.

[2]  Detlef Schoder,et al.  Co-authorship networks in electronic markets research , 2011, Electron. Mark..

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation , 2005, J. Assoc. Inf. Sci. Technol..

[5]  Joon-Oh Park,et al.  The Increasing Dominance of Teams in Production of Knowledge , 2011 .

[6]  Neil R. Smalheiser,et al.  Author name disambiguation in MEDLINE , 2009, TKDD.

[7]  N. Haslam,et al.  Early-career scientific achievement and patterns of authorship: the mixed blessings of publication leadership and collaboration , 2009 .

[8]  John P. A. Ioannidis,et al.  Measuring Co-Authorship and Networking-Adjusted Scientific Impact , 2008, PloS one.

[9]  Donald D. Bergh,et al.  Some predictors of SMJ article impact , 2006 .

[10]  Jesús Rey-Rocha,et al.  Some Misuses of Journal Impact Factor in Research Evaluation , 2001, Cortex.

[11]  J. Hirsch Does the h index have predictive power? , 2007, Proceedings of the National Academy of Sciences.

[12]  Barry Bozeman,et al.  The Impact of Research Collaboration on Scientific Productivity , 2005 .

[13]  Paula E. Stephan,et al.  Scientific Teams and Institution Collaborations: Evidence from U.S. Universities, 1981-1999 , 2004 .

[14]  Radhamany Sooryamoorthy,et al.  Do types of collaboration change citation? Collaboration and citation patterns of South African science publications , 2009, Scientometrics.

[15]  Sam Wilson,et al.  What makes an article influential? Predicting impact in social and personality psychology , 2008, Scientometrics.

[16]  P. Verhoef,et al.  Globalization of Authorship in the Marketing Discipline: Does It Help or Hinder the Field? , 2005 .

[17]  Priya Raghubir,et al.  Male—Female Dynamics in Groups: A Field Study of The Weakest Link , 2010 .

[18]  Vetle I. Torvik,et al.  A search engine approach to estimating temporal changes in gender orientation of first names , 2013, JCDL '13.

[19]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[20]  Paul F. Skilton Does the human capital of teams of natural science authors predict citation frequency? , 2009, Scientometrics.

[21]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[22]  Weiyi Meng Search Engine , 2014, Encyclopedia of Social Network Analysis and Mining.

[23]  Neil R. Smalheiser,et al.  A probabilistic similarity metric for Medline records: A model for author name disambiguation: Research Articles , 2005 .

[24]  Ali Gazni,et al.  Investigating different types of research collaboration and citation impact: a case study of Harvard University’s publications , 2011, Scientometrics.

[25]  Jeffrey Parker An Empirical Examination of the Roles of Ability and Gender in Collaborative Homework Assignments , 2010 .