Exploring constructions on the web: a case study

This paper presents a case study on grammatical variation that exemplifies both possibilities and limits of using data drawn from the web in linguistic research. In particular, the impact of animacy (of the modifier) on the choice between s-genitives such as driver’s licence and noun+noun constructions (driver licence) will be tested. It will be shown that this type of variation is extremely difficult to study on the basis of ‘traditional’ electronic corpora since these do not contain a sufficient number of crucial tokens. In this case, therefore, the web provides a unique data resource for investigating a phenomenon which otherwise could barely be studied at all in a corpus. At the same time, however, this paper will also discuss various obstacles we may run into when using web data. Most crucially, it will be shown that – at least in the present case – the WebCorp software provides a more reliable means of retrieving data from the web than Google. The findings and conclusions of the present case study are embedded within a general discussion on using web data in linguistic research.