Identifying biases arising from combining census and administrative data – the fertility of migrants in the Office for National Statistics Longitudinal Study

Demographic research is increasingly making use of longitudinal and life history data, given its strong analytical potential. Such data are frequently produced by linking and matching records from multiple sources. Where this is the case, there is the potential for a person’s appearance in one source of data to be conditional on an event in another source of data. This can lead to bias in estimating occurrence/exposure rates concerning the event in question, unless the correct exposure can be identified. Achieving the latter requires understanding the reasons governing entry to the data. The Office for National Statistics (ONS) Longitudinal Study (LS) for England and Wales is a 1% sample of the population, constructed by combining data from the census, vital registrations (births and deaths) and the National Health Service Central Register (NHSCR). This paper examines the difficulties in obtaining the correct exposure for rates in complex data sets by studying the fertility of migrants using the ONS LS. Three tests in relation to the fertility of female migrants to England and Wales illustrate the possible association between exposure to risk and subsequent events. The first identifies the ability of the data set to record new migrants, the second is concerned with the mode of entry to the data set and subsequent fertility, and the third illustrates how the recorded fertility of migrants depends upon the way migration is measured.