Impact of lexical and sentiment factors on the popularity of scientific papers

We investigate how textual properties of scientific papers relate to the number of citations they receive. Our main finding is that correlations are nonlinear and affect differently the most cited and typical papers. For instance, we find that, in most journals, short titles correlate positively with citations only for the most cited papers, whereas for typical papers, the correlation is usually negative. Our analysis of six different factors, calculated both at the title and abstract level of 4.3 million papers in over 1500 journals, reveals the number of authors, and the length and complexity of the abstract, as having the strongest (positive) influence on the number of citations.

[1]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[2]  Johan Bollen,et al.  A Principal Component Analysis of 39 Scientific Impact Measures , 2009, PloS one.

[3]  James P. Bagrow,et al.  Human language reveals a universal positivity bias , 2014, Proceedings of the National Academy of Sciences.

[4]  C. Paiva,et al.  Articles with short titles describing the results are cited more often , 2012, Clinics.

[5]  Bei Yu,et al.  Automated citation sentiment analysis: What can we learn from biomedical researchers , 2013, ASIST.

[6]  Fuyuki Yoshikane,et al.  Factors affecting citation rates of research articles , 2015, J. Assoc. Inf. Sci. Technol..

[7]  Mohammad Hajizadeh,et al.  The effect of characteristics of title on citation rates of articles , 2013, Scientometrics.

[8]  Christian Catalini,et al.  The incidence and role of negative citations in science , 2015, Proceedings of the National Academy of Sciences.

[9]  Eduardo G. Altmann,et al.  Predictability of Extreme Events in Social Media , 2014, PloS one.

[10]  Boer Deng Papers with shorter titles get more citations , 2015, Nature.

[11]  R. Gunning The Technique of Clear Writing. , 1968 .

[12]  J. Russell,et al.  A 12-Point Circumplex Structure of Core Affect. , 2011, Emotion.

[13]  Sally Wyatt,et al.  What a difference a colon makes: how superficial factors influence subsequent citation , 2013, Scientometrics.

[14]  Georgios Paltoglou,et al.  Entropy-Growth-Based Model of Emotionally charged Online Dialogues , 2012, Adv. Complex Syst..

[15]  J. Russell A circumplex model of affect. , 1980 .

[16]  Adrian Letchford,et al.  The advantage of short paper titles , 2015, Royal Society Open Science.

[17]  Jean-Philippe Cointet,et al.  Phylomemetic Patterns in Science Evolution—The Rise and Fall of Scientific Fields , 2013, PloS one.

[18]  Arvid Kappas,et al.  Collective Emotions Online and Their Influence on Community Life , 2011, PloS one.

[19]  Eduardo G. Altmann,et al.  Scaling laws and fluctuations in the statistics of word frequencies , 2014, ArXiv.

[20]  Michael Szell,et al.  A century of physics , 2015, Nature Physics.

[21]  Amy Beth Warriner,et al.  Norms of valence, arousal, and dominance for 13,915 English lemmas , 2013, Behavior Research Methods.

[22]  Matjaz Perc,et al.  Self-organization of progress across the century of physics , 2013, Scientific Reports.

[23]  Cristina Davino,et al.  Quantile Regression: Theory and Applications , 2013 .

[24]  R. Koenker,et al.  Regression Quantiles , 2007 .

[25]  Mike Thelwall,et al.  The role of emotional variables in the classification and prediction of collective social dynamics , 2014, ArXiv.

[26]  Katherine L. Milkman,et al.  The science of sharing and the sharing of science , 2014, Proceedings of the National Academy of Sciences.

[27]  Mike Thelwall,et al.  Which factors help authors produce the highest impact research? Collaboration, journal and document properties , 2013, J. Informetrics.

[28]  Frank Schweitzer,et al.  Positive words carry less information than negative words , 2011, EPJ Data Science.

[29]  Matjaz Perc,et al.  Inheritance patterns in citation networks reveal scientific memes , 2014, ArXiv.

[30]  Mike Thelwall,et al.  Determinants of research citation impact in nanoscience and nanotechnology , 2013, J. Assoc. Inf. Sci. Technol..

[31]  Katy Börner,et al.  Models of Science Dynamics , 2012 .

[32]  Ludo Waltman,et al.  Predicting the long-term citation impact of recent publications , 2015, J. Informetrics.

[33]  Gustav Herdan,et al.  Language as choice and chance , 1957 .