Using relational co-occurrences to trace phraseological development in a longitudinal corpus

L2 research has witnessed a boom in the number of studies that investigate learners’ use of multi-word combinations with the help of measures of association strength such as the mutual information (MI) score (e.g. Durrant & Schmitt, 2009; Li & Schmitt, 2010; Granger & Bestgen, 2014). Most studies so far, however, have investigated positional co-occurrences, where words are said to co-occur when they appear within a certain distance from each other (Evert, 2004), and focused more particularly on adjacent word combinations such as adjective + noun combinations. Paquot (2014) is to the best of our knowledge the first study that adopted a relational model of co-occurrences, where the co-occurring words appear in a specific structural relation, to compare three learner sub-corpora made up of texts rated at different CEFR levels (i.e. B2, C1 and C2). She made use of the Stanford CoreNLP suite of tools to parse learner data and extract dependency relations such as dobj(win,lottery), i.e. “the direct object of win is lottery”, and then used MI score computed on the basis of a large reference corpus to analyse pairs of words in specific grammatical relations. Findings showed that adjective + noun relations discriminated well between B2 and C2 levels; adverbial modifiers separated out B2 texts from the C1 and C2 texts; and verb + direct object relations set C2 texts apart from B2 and C1 texts. These results suggest that, used together, phraseological indices computed on the basis of relational dependencies are able to gauge language proficiency. The main objective of this study is to investigate whether relational co-occurrences also constitute valid indices of phraseological development. To do so, we replicate the method used in Paquot (2014) on data from the Longitudinal Database of Learner English (LONGDALE, Meunier 2013). In the LONGDALE project, the same students are followed over a period of at least three years and data collections are typically organized once per year. The 78 argumentative essays selected for this study were written by 39 French learners of English in Year 1 and Year 3 of their studies at the University of Louvain. Unlike in Year 2, students were requested to write on the same topic in Year 1 and Year 3, which allows us to control for topic, a variable that has been shown to considerably influence learners’ use of word combinations (e.g. Cortes, 2004). Relational co-occurrences are operationalized in the form of word combinations used in four grammatical relations, i.e. adjective + noun, adverb + adjective, adverb + verb and verb + direct object. We assign to each pair of words in the LONGDALE corpus its MI score computed on the basis of the British National Corpus, and compute mean MI scores for each dependency relations in each learner text. To explore the links between individual and group phraseological development trajectories, a detailed variability analysis using the method of individual profiling and visualization techniques will also be presented (cf. Verspoor & Smiskova, 2012).