Language use reflects scientific methodology: A corpus-based study of peer-reviewed journal articles

Recently, philosophers of science have argued that the epistemological requirements of different scientific fields lead necessarily to differences in scientific method. In this paper, we examine possible variation in how language is used in peer-reviewed journal articles from various fields to see if features of such variation may help to elucidate and support claims of methodological variation among the sciences. We hypothesize that significant methodological differences will be reflected in related differences in scientists’ language style.This paper reports a corpus-based study of peer-reviewed articles from twelve separate journals in six fields of experimental and historical sciences. Machine learning methods were applied to compare the discourse styles of articles in different fields, based on easily-extracted linguistic features of the text. Features included function word frequencies, as used often in computational stylistics, as well as lexical features based on systemic functional linguistics, which affords rich resources for comparative textual analysis. We found that indeed the style of writing in the historical sciences is readily distinguishable from that of the experimental sciences. Furthermore, the most significant linguistic features of these distinctive styles are directly related to the methodological differences posited by philosophers of science between historical and experimental sciences, lending empirical weight to their contentions.

[1]  Lorenzo Magnani,et al.  Model-Based Reasoning in Scientific Discovery , 1999, Springer US.

[2]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[3]  D. Allbritton,et al.  Metaphor-based schemas and text representations: making connections through conceptual metaphors. , 1995, Journal of experimental psychology. Learning, memory, and cognition.

[4]  Richard Sproat,et al.  Automatic ambiguity detection , 1998, ICSLP.

[5]  K. Hyland,et al.  Disciplinary Discourses: Social Interactions in Academic Writing , 2001 .

[6]  Klaus-Uwe Panther Finding metaphor in grammar and usage: A methodological analysis of theory and research: Gerard J. Steen, John Benjamins, Amsterdam/Philadelphia, 2007, 430 pp., EUR 110.00/USD 165.00 , 2009 .

[7]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[8]  Yorick Wilks,et al.  Making Preferences More Active , 1978, Artif. Intell..

[9]  Jeff White Readings in agents , 1998 .

[10]  P. Kitcher The Advancement of Science , 1993 .

[11]  G. Yule ON SENTENCE- LENGTH AS A STATISTICAL CHARACTERISTIC OF STYLE IN PROSE: WITH APPLICATION TO TWO CASES OF DISPUTED AUTHORSHIP , 1939 .

[12]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[13]  Herbert A. Simon,et al.  Collaborative Discovery in a Scientific Domain , 1997, Cogn. Sci..

[14]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[15]  Ann Cowling,et al.  Social constraints on grammatical variables: tense choice in English , 1987 .

[16]  Jonathan Fine,et al.  Expository Discourse: A Genre-Based Approach to Social Science Research Texts , 2001 .

[17]  J. Shea National Science Education Standards , 1995 .

[18]  G. Lakoff,et al.  Why cognitive linguistics requires embodied realism , 2002 .

[19]  Victor R. Baker,et al.  The pragmatic roots of American Quaternary geology and geomorphology , 1996 .

[20]  K. Dunbar,et al.  The in vivo/in vitro approach to cognition: the case of analogy , 2001, Trends in Cognitive Sciences.

[21]  Margaret H. Freeman Poetry and the Scope of Metaphor: Toward a Cognitive Theory of Literature , 2000 .

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Janet Bond-Robinson,et al.  Grounding Scientific Inquiry and Knowledge in Situated Cognition , 2005 .

[24]  Shlomo Argamon,et al.  Systemic Functional Features in Stylistic Text Classification , 2004, AAAI Technical Report.

[25]  B. Asher The Professional Vision , 1994 .

[26]  P. Wignall,et al.  Lyell:the past is the key to the present , 1999 .

[27]  S. Fienberg,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[28]  Shlomo Argamon,et al.  Stylistic text classification using functional lexical features , 2007, J. Assoc. Inf. Sci. Technol..

[29]  W. Klein,et al.  Bibliometrics , 2005, Social work in health care.

[30]  Carol E. Cleland Methodological and Epistemic Differences between Historical Science and Experimental Science* , 2002, Philosophy of Science.

[31]  C. Matthiessen Lexicogrammatical cartography : English systems , 1995 .

[32]  Jeff Dodick,et al.  Geology as an Historical Science: Its Perception within Science and the Education System , 2003 .

[33]  Chris Brew,et al.  Spectral Clustering for German Verbs , 2002, EMNLP.

[34]  Martin J. S. Rudwick,et al.  Lyell and the Principles of Geology , 1998, Geological Society, London, Special Publications.

[35]  Katherine W. McCain,et al.  Visualizing a discipline: an author co-citation analysis of information science, 1972–1995 , 1998 .

[36]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[37]  M. Daniels,et al.  A "Primer" in Conceptual Metaphor for Counselors. , 1999 .

[38]  G. Lakoff The Contemporary Theory of Metaphor , 1993 .

[39]  Jaime G. Carbonell,et al.  Metapher - A Key to Extensible Semantic Analysis , 1980, ACL.

[40]  L. E. Leidy,et al.  Guns, germs and steel: The fates of human societies , 1999 .

[41]  Zachary J. Mason CorMet: A Computational, Corpus-Based Conventional Metaphor Extraction System , 2004, CL.

[42]  H. van Halteren,et al.  Outside the cave of shadows: using syntactic annotation to enhance authorship attribution , 1996 .

[43]  R. Gibbs Categorization and metaphor understanding. , 1992, Psychological review.

[44]  D. WhiteHoward,et al.  Visualizing a discipline , 1998 .

[45]  J. F. Burrows,et al.  Not Unles You Ask Nicely: The Interpretative Nexus Between Analysis and Information , 1992 .

[46]  Michael Halliday,et al.  Cohesion in English , 1976 .

[47]  Jon Patrick,et al.  Identifying Interpersonal Distance using Systemic Features , 2003, Computing Attitude and Affect in Text.

[48]  Tim Rohrer,et al.  The cognitive science of metaphor from philosophy to neuropsychology , 1995 .

[49]  Barry Smyth,et al.  Genre Classification and Domain Transfer for Information Filtering , 2002, ECIR.

[50]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[51]  Jeannett Martin,et al.  Writing Science: Literacy And Discursive Power , 1993 .

[52]  Yorick Wilks,et al.  Preference Semantics, III-Formedness, and Metaphor , 1983, Am. J. Comput. Linguistics.

[53]  Kevin Dunbar,et al.  What Scientific Thinking Reveals About the Nature of Cognition , 2000 .

[54]  Shlomo Argamon,et al.  Using appraisal groups for sentiment analysis , 2005, CIKM '05.

[55]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[56]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[57]  Frank Boers,et al.  Applied Linguistics Perspectives on Cross-Cultural Variation in Conceptual Metaphor , 2003 .

[58]  Carol E. Cleland Historical science, experimental science, and the scientific method , 2001 .

[59]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[60]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple author discrimination , 2003 .

[61]  Michael E. Gorman,et al.  Scientific and Technological Thinking , 2006 .

[62]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[63]  J. Diamond Guns, Germs, and Steel: The Fates of Human Societies , 1999 .

[64]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[65]  S. Vereza Philosophy in the flesh: the embodied mind and its challenge to Western thought , 2001 .

[66]  Kathleen Ahrens,et al.  From Lexical Semantics to Conceptual Metaphors : Mapping Principle Verification with WordNet and SUMO , 2004 .

[67]  Joan H. Fujimura,et al.  Constructing `Do-able' Problems in Cancer Research: Articulating Alignment , 1987 .

[68]  Nancy J. Nersessian,et al.  Interpreting Scientific and Engineering Practices: Integrating the cognitive, social, and cultural dimensions , 2003 .

[69]  Thomas Merriam,et al.  Distinguishing literary styles using neural networks , 1997 .

[70]  Stephen Jay Gould,et al.  Evolution and the Triumph of Homology, or Why History Matters , 1986 .

[71]  Ophir Frieder,et al.  Information Retrieval: Algorithms and Heuristics , 1998 .

[72]  D. Hull Darwin and His Critics: The Reception of Darwin's Theory of Evolution by the Scientific Community , 1973 .

[73]  Shlomo Argamon,et al.  Automatically Categorizing Written Texts by Author Gender , 2002, Lit. Linguistic Comput..

[74]  B. Latour,et al.  Laboratory Life: The Construction of Scientific Facts , 1979 .

[75]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[76]  Anat Rachel Shimoni,et al.  Gender, genre, and writing style in formal written texts , 2003 .

[77]  StamatatosEfstathios,et al.  Automatic text categorization in terms of genre and author , 2000 .

[78]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[79]  Gregory J. Kelly,et al.  How Students Argue Scientific Claims: A Rhetorical‐Semantic Analysis , 2003 .

[80]  Katherine W. McCain,et al.  Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972-1995 , 1998, J. Am. Soc. Inf. Sci..

[81]  Galit Avneri,et al.  Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[82]  Marc Moens,et al.  Sentence extraction and rhetorical classification for flexible abstracts , 1998 .

[83]  Shlomo Argamon,et al.  Style mining of electronic messages for multiple authorship discrimination: first results , 2003, KDD '03.

[84]  Robert A. Cooper,et al.  How Evolutionary Biologists Reconstruct History: Patterns & Processes , 2004 .

[85]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1992, Scientometrics.

[86]  Casey Whitelaw Using Appraisal Taxonomies for Sentiment Analysis , 2005 .

[87]  K. Dunbar HOW SCIENTISTS REALLY REASON: SCIENTIFIC REASONING IN REAL-WORLD LABORATORIES , 1995 .

[88]  Xiaojin Zhu,et al.  Hunting Elusive Metaphors Using Lexical Resources. , 2007 .

[89]  Mayr,et al.  Evolution and the diversity of life , 1942 .

[90]  Michael Mulkay,et al.  Scientists' theory talk , 1983 .

[91]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[92]  Yllias Chali,et al.  Text Summarization Using Lexical Chains , 2001 .

[93]  John L. Rudolph,et al.  Evolution and the nature of science: On the historical discord and its implications for education , 1998 .

[94]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[95]  Graeme Hirst,et al.  Segmenting a document by stylistic character , 2003 .

[96]  Herbert S. White,et al.  Citation-based auditing of academic performance , 1994 .

[97]  J. Lagowski National Science Education Standards , 1995 .

[98]  Sanda M. Harabagiu From Lexical Cohesion to Textual Coherence: A Data Driven Perspective , 1999, Int. J. Pattern Recognit. Artif. Intell..

[99]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[100]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[101]  Michael H. MacRoberts,et al.  Problems of citation analysis , 1996, Scientometrics.

[102]  W. Whewell A History of the Inductive Sciences , 1838, The British and foreign medical review.

[103]  Charles Goodwin,et al.  Seeing in Depth , 1995 .

[104]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[105]  Martin R. Gibbs,et al.  Mediating intimacy: designing technologies to support strong-tie relationships , 2005, CHI.

[106]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[107]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[108]  Elinor Ochs,et al.  Interpretive Journeys: How Physicists Talk and Travel through Graphic Space , 1994 .

[109]  Greg Myers,et al.  Writing biology : texts in the social construction of scientific knowledge , 1990 .

[110]  Eduard Hovy,et al.  In Defense of Syntax: Informational, Intentional, and Rhetorical Structures in Discourse , 1993 .

[111]  Robert A. Cooper,et al.  Scientific Knowledge of the Past Is Possible: Confronting Myths About Evolution & Scientific Methods , 2002 .

[112]  Joseph Harris,et al.  The Idea of Community in the Study of Writing , 1989 .

[113]  C. Maier,et al.  Genre analysis , 2012 .

[114]  J. Wandersee,et al.  How does biological knowledge grow? a study of life scientists' research practices , 1995 .

[115]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[116]  Blaise Cronin,et al.  The Hand of Science: Academic Writing and Its Rewards , 2005 .

[117]  G. W. Lamplugh,et al.  The Geological Society of London , 1961, Nature.

[118]  R. Frodeman Geological reasoning: Geology as an interpretive and historical science , 1995 .

[119]  G. Udny Yule ON SOME PROPERTIES OF NORMAL DISTRIBUTIONS, UNIVARIATE AND BIVARIATE, BASED ON SUMS OF SQUARES OF FREQUENCIES , 1938 .

[120]  Vernon Pratt,et al.  Philosophy of Biology. , 1995 .

[121]  Elinor Ochs,et al.  Down to the wire: The cultural clock of physicists and the discourse of consensus , 1997, Language in Society.

[122]  Graeme Hirst,et al.  Segmenting documents by stylistic character , 2005, Natural Language Engineering.