Exploiting a Corpus to Compile a Lexical Resource for Academic Writing: Spanish Lexical Combinations

This paper provides insight into ongoing research focusing on the exploitation of Spanish academic corpora in order to build up a lexical tool addressed to novice writers of academic texts. The object of the lexical tool is what we call academic lexical combinations (ALC). By ALC we mean recurrent segments of words that may or may not be semantically compositional and fulfill rhetorical functions such as giving examples, concluding, expressing emphasis, etc. These functions are particularly prominent in academic discourse. ALCs comprise from collocations to idioms as well as formulas, as they are understood in the Meaning-Text Theory (Mel’cuk, 2012). The procedure adopted for the extraction of the ALC from the corpus is described along with how we combine statistical information and native speakers’ intuition. Even if corpora play a leading role in the construction of our lexical tool, we need to filter out corpus output with phraseological criteria, which makes human intervention necessary. Finally, we specify the architecture of the lexical tool and we show different prototype lexicographical entries.

[1]  Douglas Biber,et al.  A corpus-driven approach to formulaic language in English: multi-word patterns in speech and writing , 2009 .

[2]  Philip Durrant Discipline and Level Specificity in University Students' Written Vocabulary , 2014 .

[3]  Igor Mel’čuk,et al.  Phraseology in the language, in the dictionary, and in the computer , 2012 .

[4]  A. Kilgarriff Comparing Corpora , 2001 .

[5]  Igor Mel’čuk,et al.  Clichés, an Understudied Subclass of Phrasemes , 2015 .

[6]  Christine B. Feak,et al.  Academic Writing for Graduate Students , 1994 .

[7]  Magali Paquot,et al.  Academic Vocabulary in Learner Writing: From Extraction to Analysis , 2010 .

[8]  Igor Mel’čuk,et al.  Lexical functions: a tool for the description of lexical relations in a lexicon , 1996 .

[9]  K. Hyland,et al.  As can be seen: Lexical bundles and disciplinary variation , 2008 .

[10]  Serge Verlinde,et al.  Data access revisited: The Interactive Language Toolbox , 2012 .

[11]  S. Gries Dispersions and adjusted frequencies in corpora , 2008 .

[12]  Diana Lea,et al.  Oxford Learner's Dictionary of Academic English , 2018 .

[13]  Jette Hedegaard,et al.  Dice in the Web: an Online Spanish Collocation Dictionary , 2012 .

[14]  John M. Swales,et al.  Tracing convergence and divergence in pairs of Spanish and English research article abstracts: The case of Ibérica , 2011 .

[16]  Carmen Pérez-Llantada,et al.  Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage , 2014 .

[17]  Agnès Tutin Showing phraseology in context: onomasiological access to lexico-grammatical patterns in corpora of French scientific writings , 2010 .

[18]  Michael McCarthy,et al.  Academic vocabulary in use : 50 units of academic vocabulary reference and practice self-study and classroom use , 2008 .

[19]  Giovanni Parodi,et al.  Academic and professional discourse genres in Spanish , 2010 .

[20]  Viviana Cortes Lexical bundles in published and student disciplinary writing: Examples from history and biology , 2004 .

[21]  Sofie Johansson Kokkinakis,et al.  Developing Academic Word Lists for Swedish, Norwegian and Danish – a joint research project , 2012 .

[23]  D. Biber,et al.  If you look at …: Lexical Bundles in University Teaching and Textbooks , 2004 .

[24]  Agnès Tutin,et al.  Comparing Recurring Lexico-Syntactic Trees (RLTs) and Ngram Techniques for Extended Phraseology Extraction , 2017, MWE@EACL.

[25]  Hilary Nesi,et al.  A Classification of Genre Families in University Student Writing. , 2013 .

[26]  Viviana Cortes,et al.  The purpose of this study is to: Connecting lexical bundles and moves in research article introductions , 2013 .

[27]  Yu-Hua Chen,et al.  Developing the Academic Collocation List (ACL) – A corpus-driven and expert-judged approach , 2013 .

[28]  Viviana Cortes,et al.  A comparative analysis of lexical bundles in academic history writing in English and Spanish , 2008 .

[29]  Nicole Tracy-Ventura,et al.  Lexical bundles in Spanish speech and writing , 2007 .

[30]  Yu Kyoung Shin,et al.  Lexical Bundles in Native and Non-Native Scientific Writing: Applying a Corpus-based Study to Language Teaching , 2016 .

[31]  P. Trofimovich,et al.  Transitional probability predicts native and non‐native use of formulaic sequences , 2017 .

[32]  Magali Paquot The LEAD dictionary-cum-writing aid: an integrated dictionary and corpus tool , 2012 .

[33]  N. Ellis,et al.  An Academic Formulas List: New Methods in Phraseology Research , 2010 .

[34]  Averil Coxhead A New Academic Word List , 2000 .

[35]  Estrella Montolío Manual de escritura : académica y profesional , 2014 .

[36]  A. Tutin La phraséologie transdisciplinaire des écrits scientifiques : des collocations aux routines sémantico-rhétoriques. , 2014 .

[37]  Natalia Judith Laso,et al.  Biomedical English : a corpus-based approach , 2013 .

[38]  Tutin, Agnès, and Francis Grossmann (eds.). 2013. L’écrit scientifique : du lexique au discours , 2016 .

[39]  Panagiotis Papapetrou,et al.  Significance testing of word frequencies in corpora , 2016, Digit. Scholarsh. Humanit..

[40]  Elena Cotos,et al.  Enhancing writing pedagogy with learner corpus data , 2014, ReCALL.