Examining scientific writing styles from the perspective of linguistic complexity

Publishing articles in high‐impact English journals is difficult for scholars around the world, especially for non‐native English‐speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English‐speaking scholars (NESs) and NNESs, we collected a large‐scale data set containing more than 150,000 full‐text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two‐fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.

[1]  Ying Ding,et al.  Understanding scientific collaboration: Homophily, transitivity, and preferential attachment , 2018, J. Assoc. Inf. Sci. Technol..

[2]  Min Zhang,et al.  Reviewer bias in single- versus double-blind peer review , 2017, Proceedings of the National Academy of Sciences.

[3]  Evelina Fedorenko,et al.  Syntactic Complexity Effects in Sentence Production: A Reply to MacDonald, Montag, and Gennari (2016) , 2017, Cogn. Sci..

[4]  Steven Skiena,et al.  Nationality Classification Using Name Embeddings , 2017, CIKM.

[5]  Mirjana Bozic,et al.  Syntactic Complexity and Frequency in the Neurocognitive Language System , 2017, Journal of Cognitive Neuroscience.

[6]  Magnus Strand,et al.  COMPARISON AND ANALYSIS , 2017 .

[7]  Véronique Hoste,et al.  All Mixed Up? Finding the Optimal Feature Set for General Readability Prediction and Its Application to English and Dutch , 2016, Computational Linguistics.

[8]  Sergiu Nisioi,et al.  On the Similarities Between Native, Non-native and Translated Texts , 2016, ACL.

[9]  Min Song,et al.  Author credit‐assignment schemas: A comparison and analysis , 2016, J. Assoc. Inf. Sci. Technol..

[10]  Kosuke Imai,et al.  Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records , 2016, Political Analysis.

[11]  Vetle I. Torvik,et al.  Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a large-scale bibliographic database , 2016 .

[12]  Xiaofei Lu,et al.  Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality , 2015 .

[13]  Sergiu Nisioi,et al.  Feature Analysis for Native Language Identification , 2015, CICLing.

[14]  Evelina Fedorenko,et al.  Syntactic Complexity Effects in Sentence Production , 2015, Cogn. Sci..

[15]  Jana Diesner,et al.  Coauthorship networks: A directed network approach considering the order and number of coauthors , 2015, J. Assoc. Inf. Sci. Technol..

[16]  Vetle I. Torvik,et al.  A search engine approach to estimating temporal changes in gender orientation of first names , 2013, JCDL '13.

[17]  Yulia Tsvetkov,et al.  Identifying the L1 of non-native writers: the CMU-Haifa system , 2013, BEA@NAACL-HLT.

[18]  C. Lee Giles,et al.  Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching , 2012, AAAI.

[19]  Walt Detmar Meurers,et al.  On Improving the Accuracy of Readability Classification using Insights from Second Language Acquisition , 2012, BEA@NAACL-HLT.

[20]  David Yarowsky,et al.  Stylometric Analysis of Scientific Articles , 2012, NAACL.

[21]  Ana María Fernández Dobao,et al.  Collaborative writing tasks in the L2 classroom: Comparing group, pair, and individual work , 2012 .

[22]  Dawn Bikowski,et al.  COLLABORATIVE WRITING AMONG SECOND LANGUAGE LEARNERS IN ACADEMIC WEB-BASED PROJECTS , 2012 .

[23]  Mark Dras,et al.  Exploiting Parse Structures for Native Language Identification , 2011, EMNLP.

[24]  Judit Kormos,et al.  Task complexity and linguistic and discourse features of narrative writing performance , 2011 .

[25]  Xiaofei Lu A Corpus-Based Evaluation of Syntactic Complexity Measures as Indices of College-Level ESL Writers' Language Development , 2011 .

[26]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[27]  C. Y. Fook,et al.  Computational Text Analysis: A More Comprehensive Approach to Determine Readability of Reading Materials , 2010 .

[28]  Ju Chuan Huang,et al.  Publishing and learning writing for publication in English: Perspectives of NNES PhD students in science , 2010 .

[29]  Peter Skehan,et al.  Modelling Second Language Performance: Integrating Complexity, Accuracy, Fluency, and Lexis , 2009 .

[30]  Steven Skiena,et al.  Name-ethnicity classification from open sources , 2009, KDD.

[31]  W. Nagy,et al.  Syntactic complexity as a predictor of adolescent writing quality: Which measures? Which genre? , 2009 .

[32]  Michael Billig,et al.  The language of critical discourse analysis: the case of nominalization , 2008 .

[33]  Barbara J. Juhasz,et al.  The processing of compound words in English: Effects of word length on eye movements during reading , 2008 .

[34]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[35]  Jang Syh-Jong,et al.  A study of students' construction of science knowledge: talk and writing in a collaborative group , 2007 .

[36]  Maki Ojima,et al.  Concept mapping as pre-task planning: A case study of three Japanese ESL writers , 2006, System.

[37]  Barry Bozeman,et al.  The Impact of Research Collaboration on Scientific Productivity , 2005 .

[38]  Aleksandra Misak,et al.  Manuscript editing as a way of teaching academic writing: experience from a small scientific journal. , 2005, Journal of B.U.ON. : official journal of the Balkan Union of Oncology.

[39]  Moshe Koppel,et al.  Automatically Determining an Anonymous Author's Native Language , 2005, ISI.

[40]  T. Lillis,et al.  Multilingual Scholars and the Imperative to Publish in English: Negotiating Interests, Demands, and Rewards , 2004 .

[41]  William H. DuBay The Principles of Readability. , 2004 .

[42]  D. Ferris The ‘‘Grammar Correction’ ’ Debate in L2 Writing: , 2022 .

[43]  R. Ellis,et al.  THE EFFECTS OF PLANNING ON FLUENCY, COMPLEXITY, AND ACCURACY IN SECOND LANGUAGE NARRATIVE WRITING , 2004, Studies in Second Language Acquisition.

[44]  P. Lowry,et al.  Building a Taxonomy and Nomenclature of Collaborative Writing to Improve Interdisciplinary Research and Practice , 2004 .

[45]  L. Ortega Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-Level L2 Writing. , 2003 .

[46]  Jean Chandler,et al.  THE EFFICACY OF VARIOUS KINDS OF ERROR FEEDBACK FOR IMPROVEMENT IN THE ACCURACY AND FLUENCY OF L2 STUDENT WRITING , 2003 .

[47]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[48]  Laura Mayfield Tomokiyo,et al.  You’re Not From ’Round Here, Are You? Naive Bayes Detection of Non-Native Utterances , 2001, NAACL.

[49]  K. Topping,et al.  Collaborative writing: the effects of metacognitive prompting and structured peer interaction. , 2001, The British journal of educational psychology.

[50]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[51]  John Flowerdew,et al.  Discourse Community, Legitimate Peripheral Participation, and the Nonnative-English-Speaking Scholar. , 2000 .

[52]  John Flowerdew,et al.  Problems in writing for scholarly publication in English: The case of Hong Kong , 1999 .

[53]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[54]  L. Mason Sharing cognition to construct scientific knowledge in school context: The role of oral and written discourse , 1998 .

[55]  Charlene Polio,et al.  Measures of Linguistic Accuracy in Second Language Writing Research , 1997 .

[56]  P. Nation,et al.  Vocabulary size and use: Lexical richness in L2 written production , 1995 .

[57]  Cheryl A. Engber The relationship of lexical proficiency to the quality of ESL compositions , 1995 .

[58]  Carolyn W. Keys The development of scientific reasoning skills in conjunction with collaborative writing assignments: An interpretive study of six ninth‐grade students , 1994 .

[59]  Dana R. Ferris,et al.  Lexical and Syntactic Features of ESL Writing by Students at Different Levels of L2 Proficiency , 1994 .

[60]  Tony Silva Toward an Understanding of the Distinct Nature of L2 Writing: The ESL Research and Its Implications , 1993 .

[61]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[62]  Richard C. Gebhardt Teamwork and Feedback: Broadening the Base of Collaborative Writing. , 1980 .

[63]  Philip P. DiStefano,et al.  Sentence Weights: An Alternative to the T-Unit. , 1979 .

[64]  Vetle I. Torvik,et al.  MapAffil: A Bibliographic Tool for Mapping Author Affiliation Strings to Cities and Their Geocodes Worldwide , 2015, D Lib Mag..

[65]  Magali Paquot,et al.  Native language identification , 2015 .

[66]  Zhang Dan-da,et al.  A Corpus-based Study on Nominalizations in Argumentative Essays of Chinese EFL Learners , 2015 .

[67]  Magali Paquot,et al.  The Cambridge Handbook of Learner Corpus Research: Learner corpora and native language identification , 2015 .

[68]  Cassidy R. Sugimoto,et al.  Bias in peer review , 2013, J. Assoc. Inf. Sci. Technol..

[69]  Xiaofei Lu,et al.  Automatic analysis of syntactic complexity in second language writing , 2010 .

[70]  Icy Lee,et al.  L2 writing teachers' perspectives, practices and problems regarding error feedback , 2003 .

[71]  David A. Campbell,et al.  Comparing syntactic complexity in medical and non-medical corpora , 2001, AMIA.

[72]  Dana R. Ferris,et al.  Rhetorical Strategies in Student Persuasive Writing: Differences between Native and Non-Native English Speakers. , 1994 .

[73]  Wsevolod W. Isajiw DEFINITION AND DIMENSIONS OF ETHNICITY: A THEORETICAL FRAMEWORK , 1993 .

[74]  Marion Crowhurst,et al.  Interrelationships Between Reading and Writing Persuasive Discourse , 1991, Research in the Teaching of English.

[75]  Philip Shaw,et al.  Science research students' composing processes , 1991 .