论文信息 - Adopting a relational model of co-occurrences to trace phraseological development

Adopting a relational model of co-occurrences to trace phraseological development

Learner corpus research has witnessed a boom in the number of studies that investigate learners’ use of multi-word combinations (see Paquot & Granger, 2012 for a recent overview). Several recent studies have adopted an approach first put forward by Schmitt and colleagues (e.g. Durrant & Schmitt, 2009) to assess whether and to what extent the word combinations used by learners are ‘native-like’ by assigning to each pair of words in a learner text an association score computed on the basis of a large reference corpus. Bestgen & Granger (2014), for example, used this procedure to analyse the Michigan State University Corpus of second language writing (MSU) and showed that mean Mutual Information (MI) scores of the bigrams used by L2 writers are positively correlated with human judgment of proficiency. Most studies so far have investigated positional co-occurrences, where words are said to co-occur when they appear within a certain distance from each other (Evert, 2004) and focused more particularly on adjacent word combinations (often in the form of adjective + noun combinations) (e.g. Li & Schmitt 2010, Siyanova & Schmitt 2008). Corpus linguists such as Evert & Krenn (2003), however, have argued strongly for a relational model of co-occurrences, where the co-occurring words appear in a specific structural relation (see also Bartsch, 2004). Paquot (2014) adopted a relational model of co-occurrences to evaluate whether such co-occurrences are good discriminators of language proficiency. She made use of the Stanford CoreNLP suite of tools to parse the French L1 component of the Varieties of English for Specific Purposes dAtabase (VESPA) and extract dependency relations in the form of triples of a relation between pairs of words such as dobj(win,lottery), i.e. “the direct object of win is lottery” (de Marneffe and Manning, 2013). She then used association measures computed on the basis of a large reference corpus to analyse pairs of words in specific grammatical relations in three VESPA sub-corpora made up of texts rated at different CEFR levels (i.e. B2, C1 and C2). Findings showed that adjective + noun relations discriminated well between B2 and C2 levels; adverbial modifiers separated out B2 texts from the C1 and C2 texts; and verb + direct object relations set C2 texts apart from B2 and C1 texts. These results suggest that, used together, phraseological indices computed on the basis of relational dependencies are able to gauge language proficiency. The main objective of this study is to investigate whether relational co-occurrences also constitute valid indices of phraseological development. To do so, we replicate the method used in Paquot (2014) on data from the Longitudinal Database of Learner English (LONGDALE, Meunier 2013, forthcoming). In the LONGDALE project, the same students are followed over a period of at least three years and data collections are typically organized once per year. The 78 argumentative essays selected for this study were written by 39 French learners of English in Year 1 and Year 3 of their studies at the University of Louvain. Unlike in Year 2, students were requested to write on the same topic in Year 1 and Year 3, which allows us to control for topic, a variable that has been shown to considerably influence learners’ use of word combinations (e.g. Cortes, 2004; Paquot, 2013). Like in Paquot (2014), relational co-occurrences are operationalized in the form of word combinations used in four grammatical relations, i.e. adjective + noun, adverb + adjective, adverb + verb and verb + direct object, and extracted from the learner and reference corpora with the Stanford CoreNLP suite of tools. We then assign to each pair of words in the LONGDALE corpus its MI score computed on the basis of the British National Corpus, and compute mean MI scores for each dependency relations in each learner text (cf. Bestgen & Granger, 2014). Distributions in the two learner data sets (i.e. Year 1 and Year 3) are tested for normality and accordingly compared with ANOVAs followed by Tuckey contrasts or Kruskal-Wallis rank sum tests followed by pairwise comparisons using Wilcoxon rank sum tests. To explore the links between individual and group phraseological development trajectories, a detailed variability analysis using the method of individual profiling and visualization techniques will also be presented (cf. Verspoor & Smiskova, 2012). References Bartsch, Sabine (2004). Structural and Functional Properties of Collocations in English. A Corpus Study of Lexical and Pragmatic Constraints on Lexical Cooccurrence. Tubingen: Narr. Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41. Cortes, V. (2004). Lexical bundles in published and student disciplinary writing : Examples from history and biology. English for Specific Purposes 23(4): 397-423. De Marneffe, M.-C. & Manning, C. (2013). Stanford typed dependencies manual. http://nlp.stanford.edu/software/dependencies_manual.pdf Durrant, P., & Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? IRAL - International Review of Applied Linguistics in Language Teaching, 47(2), 157–177. doi:10.1515/iral.2009.007 Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. PhD dissertation, IMS, University of Stuttgart. Evert, S. & Krenn, B. (2003). Computational approaches to collocations. Introductory course at the European Summer School on Logic, Language, and Information (ESSLLI 2003), Vienna. Available from www.collocations.de [retrieved 5 February 2015] Granger, S. & Bestgen, Y. (2014). The use of collocations by intermediate vs. advanced nonnative writers: A bigram-based study. International Review of Applied Linguistics in Language Teaching (IRAL) 52(3), 229-252. Li, J. & Schmitt, N. (2010). The development of collocation use in academic texts by advanced L2 learners: A multiple case-study approach. In Wood, D. (ed.), Perspectives on Formulaic Language: Acquisition and Communication. London: Continuum Press. Meunier, F. and Littre, D. (2013). Tracking Learners’ Progress. Adopting a Dual ‘Corpus Cum Experimental Data’ Approach. The Modern Language Journal 97/1, 61-76. Meunier, F. (forthcoming) Introduction to the LONGDALE project. In Castello E., Ackerley K., Coccetta F. (eds.) Studies in Learner Corpus Linguistics: Research and Applications for Foreign Language Teaching and Assessment. Bern: Peter Lang. Paquot, M. (2013). Lexical bundles and L1 transfer effects. International Journal of Corpus Linguistics 18(3): 391-417. Paquot, M. (2014). Is there a role for the lexis-grammar interface in interlanguage complexity research? Paper presented at the Colloquium on cross-linguistic aspects of complexity in second language research, Vrije Universiteit Brussel, 19 December 2014, Brussels, Belgium. Available from http://www.vub.ac.be/TALK/?q=en/node/423 [retrieved 5 February 2015] Paquot, M. & Granger, S. (2012). Formulaic language in learner corpora. Annual Review of Applied Linguistics 32, 130-149. Siyanova, A. & Schmitt, N. (2008). L2 learner production and processing of collocation: A multi-study perspective. Canadian Modern Language Review 64, 3: 429-458. Verspoor, M. & Smiskova,H. (2012). Foreign language writing development from a dynamic usage-based perspective. In Manchon, R. (Ed.), L2 Writing Development: Multiple Perspectives. Berlin: De Gruyter, 47-68.

Magali Paquot | Hubert Naets