Analyzing Linguistic Complexity and Scientific Impact

The number of publications and the number of citations received have become the most common indicators of scholarly success. In this context, scientific writing increasingly plays an important role in scholars' scientific careers. To understand the relationship between scientific writing and scientific impact, this paper selected 12 variables of linguistic complexity as a proxy for depicting scientific writing. We then analyzed these features from 36,400 full-text Biology articles and 1,797 full-text Psychology articles. These features were compared to the scientific impact of articles, grouped into high, medium, and low categories. The results suggested no practical significant relationship between linguistic complexity and citation strata in either discipline. This suggests that textual complexity plays little role in scientific impact in our data sets.

[1]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[2]  James Hartley,et al.  Solo versus collaborative writing: Discrepancies in the use of tables and graphs in academic articles , 2014, J. Assoc. Inf. Sci. Technol..

[3]  A. A. Romanovsky,et al.  Standing on the shoulders of giants , 2014, Temperature.

[4]  Ali Gazni,et al.  Are the abstracts of high impact articles more readable? Investigating the evidence from top research institutions in the world , 2011, J. Inf. Sci..

[5]  Vincent Larivière,et al.  The invariant distribution of references in scientific articles , 2016, J. Assoc. Inf. Sci. Technol..

[6]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[7]  Vincent Larivière,et al.  The impact factor's Matthew Effect: A natural experiment in bibliometrics , 2009, J. Assoc. Inf. Sci. Technol..

[8]  Jevin D. West,et al.  Viziometrics: Analyzing Visual Information in the Scientific Literature , 2016, IEEE Transactions on Big Data.

[9]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[10]  H. Engen The New Formalism. , 1971 .

[11]  L. Ortega Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-Level L2 Writing. , 2003 .

[12]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[13]  K. W. Hunt Grammatical structures written at three grade levels , 1965 .

[14]  Xiaojun Wan,et al.  WL‐index: Leveraging citation mention number to quantify an individual's scientific impact , 2014, J. Assoc. Inf. Sci. Technol..

[15]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[16]  Mike Thelwall,et al.  Which factors help authors produce the highest impact research? Collaboration, journal and document properties , 2013, J. Informetrics.

[17]  Claudio Castellano,et al.  Universality of citation distributions: Toward an objective measure of scientific impact , 2008, Proceedings of the National Academy of Sciences.

[18]  Jesper W. Schneider,et al.  Caveats for using statistical significance tests in research assessments , 2011, J. Informetrics.

[19]  Ying Ding,et al.  The distribution of references across texts: Some implications for citation analysis , 2013, J. Informetrics.

[20]  Philip P. DiStefano,et al.  Sentence Weights: An Alternative to the T-Unit. , 1979 .

[21]  Kellogg W. Hunt,et al.  Grammatical Structures Written at Three Grade Levels. NCTE Research Report No. 3. , 1965 .

[22]  G. Youmans,et al.  Measuring Lexical Style and Competence: The Type-Token Vocabulary Curve , 1990 .

[23]  Ying Ding,et al.  Data-driven Discovery: A New Era of Exploiting the Literature and Data , 2016, J. Data Inf. Sci..

[24]  Cheryl A. Engber The relationship of lexical proficiency to the quality of ESL compositions , 1995 .

[25]  Chao Lu,et al.  Examining scientific writing styles from the perspective of linguistic complexity , 2018, J. Assoc. Inf. Sci. Technol..

[26]  R. Ellis,et al.  THE EFFECTS OF PLANNING ON FLUENCY, COMPLEXITY, AND ACCURACY IN SECOND LANGUAGE NARRATIVE WRITING , 2004, Studies in Second Language Acquisition.

[27]  Douglas Biber,et al.  Challenging stereotypes about academic writing: Complexity, elaboration, explicitness , 2010 .

[28]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[29]  Z. Fang Scientific literacy: A systemic functional linguistics perspective , 2005 .

[30]  Dana R. Ferris,et al.  Rhetorical Strategies in Student Persuasive Writing: Differences between Native and Non-Native English Speakers. , 1994 .

[31]  Maki Ojima,et al.  Concept mapping as pre-task planning: A case study of three Japanese ESL writers , 2006, System.

[32]  Ludo Waltman,et al.  A review of the literature on citation impact indicators , 2015, J. Informetrics.

[33]  Luis Gravano,et al.  Predicting the impact of scientific concepts using full‐text features , 2016, J. Assoc. Inf. Sci. Technol..

[34]  Dana R. Ferris,et al.  Lexical and Syntactic Features of ESL Writing by Students at Different Levels of L2 Proficiency , 1994 .

[35]  Ronald Rousseau,et al.  Editorial delay and its relation to subsequent citations: the journals Nature, Science and Cell , 2015, Scientometrics.

[36]  Charles W Fox,et al.  The relationship between manuscript title structure and success: editorial decisions and citation performance for an ecological journal , 2015, Ecology and evolution.

[37]  R. Kirk Practical Significance: A Concept Whose Time Has Come , 1996 .

[38]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[39]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[40]  Xiaofei Lu A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development , 2011 .

[41]  H.B. Michaelson,et al.  How to write and publish a scientific paper , 1981, Proceedings of the IEEE.

[42]  Ying Ding,et al.  Understanding scientific collaboration: Homophily, transitivity, and preferential attachment , 2018, J. Assoc. Inf. Sci. Technol..

[43]  Barbara J. Juhasz,et al.  The processing of compound words in English: Effects of word length on eye movements during reading , 2008 .

[44]  Nick Haslam,et al.  Predicting Long-Term Citation Impact of Articles in Social and Personality Psychology , 2010, Psychological reports.

[45]  Albert-László Barabási,et al.  Quantifying Long-Term Scientific Impact , 2013, Science.

[46]  Sam Wilson,et al.  What makes an article influential? Predicting impact in social and personality psychology , 2008, Scientometrics.

[47]  Maura B. Nolan Medieval Sensation and Modern Aesthetics Aquinas, Adorno, Chaucer , 2013 .

[48]  Benjamin F. Jones,et al.  Supporting Online Material Materials and Methods Figs. S1 to S3 References the Increasing Dominance of Teams in Production of Knowledge , 2022 .

[49]  Vincent Larivière,et al.  Team size matters: Collaboration and scientific impact since 1900 , 2014, J. Assoc. Inf. Sci. Technol..

[50]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[51]  Robert A. Day How to write and publish a scientific paper , 1979 .

[52]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[53]  Fuyuki Yoshikane,et al.  Factors affecting citation rates of research articles , 2015, J. Assoc. Inf. Sci. Technol..

[54]  Eduardo G. Altmann,et al.  Impact of lexical and sentiment factors on the popularity of scientific papers , 2016, Royal Society Open Science.

[55]  Adrian Letchford,et al.  The advantage of short paper titles , 2015, Royal Society Open Science.

[56]  C. Snow,et al.  Academic Language and the Challenge of Reading for Learning About Science , 2010, Science.

[57]  Blaise Cronin,et al.  The Hand of Science: Academic Writing and Its Rewards , 2005 .

[58]  K. Hyland,et al.  Disciplinary Discourses, Michigan Classics Ed.: Social Interactions in Academic Writing , 2004 .

[59]  James A. Evans,et al.  Large teams develop and small teams disrupt science and technology , 2019, Nature.

[60]  M. Gordin,et al.  Scientific Babel: How Science Was Done Before and After Global English , 2015 .

[61]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[62]  P. Nation,et al.  Vocabulary size and use: Lexical richness in L2 written production , 1995 .

[63]  Tibor Braun,et al.  Relative indicators and relational charts for comparative assessment of publication output and citation impact , 1986, Scientometrics.

[64]  M. Brysbaert,et al.  Reexamining the word length effect in visual word recognition: New evidence from the English Lexicon Project , 2006, Psychonomic bulletin & review.

[65]  Charlene Polio,et al.  Measures of Linguistic Accuracy in Second Language Writing Research , 1997 .

[66]  E. Schlesinger,et al.  The science of scientific writing , 1979, Proceedings of the IEEE.

[67]  Cassidy R. Sugimoto,et al.  Argue, observe, assess: Measuring disciplinary identities and differences through socio‐epistemic discourse , 2015, J. Assoc. Inf. Sci. Technol..

[68]  Judit Kormos,et al.  Task complexity and linguistic and discourse features of narrative writing performance , 2011 .

[69]  George D. Gopen The Sense of Structure: Writing from the Reader's Perspective , 2004 .

[70]  Jan L. Youtie,et al.  Is there a clubbing effect underlying Chinese research citation Increases? , 2015, J. Assoc. Inf. Sci. Technol..

[71]  Arif Khan,et al.  The impact of author-selected keywords on citation counts , 2016, J. Informetrics.