Introducing LINDSEI, ICLE's talkative sister

The LINDSEI (Louvain International Database of Spoken English Interlanguage) project was launched in 1995, five years or so after the start of ICLE, the International Corpus of Learner English. LINDSEI was meant as ICLE’s talkative sister, a collection of spoken data produced by advanced learners of English as a foreign language. The collaboration with several universities internationally made it possible to include data from learners with a wide variety of mother tongue backgrounds. To date, eleven mother tongues are represented: Bulgarian, Chinese, Dutch, French, German, Greek, Italian, Japanese, Polish, Spanish and Swedish. The data consist of transcribed informal interviews that took place in three stages: the learner talked for a few minutes about a topic which s/he had chosen among three and had had some time to think about, s/he then answered the interviewer’s questions about general topics such as hobbies or life at university, and finally s/he was asked to describe a series of four pictures making up a story. The interviews were transcribed orthographically (with some prosodic and phonetic information like pauses or syllable lengthening), following guidelines which were specifically designed for the project and which were standardised across the subcorpora to ensure perfect comparability of the data. The transcripts were specially marked up for overlapping speech, vocalizations and foreign words, as well as for the identity of the speaker (learner or interviewer) and the division between the three tasks (prepared topic, open discussion, picture description). Each interview is accompanied by a learner profile recording a number of learner and task variables such as the learner’s age, mother tongue and knowledge of other foreign languages, or the duration and length (in words) of the interview. The corpus will soon be released and in this demo, we would like to show a prototype of the search interface, which will allow users to compile their own tailor-made corpora on the basis of a set of predefined variables and extract useful statistics. The corpus thus compiled can then be imported into a concordancer such as Wordsmith Tools for further analysis. Using small case studies as illustrations, we will present the different functionalities of the tool and demonstrate how they can be put to good use to investigate spoken interlanguage – a field which is still largely unexplored in corpus linguistics, essentially because of a lack of available data. We will also show how the kinship between LINDSEI and ICLE makes it possible to compare the two corpora and hence investigate the relation between spoken and written interlanguage.