Standardising and Coding Birthplace Strings and Occupational Titles in the British Censuses of 1851 to 1911

Abstract This article presents a technique of standardising and coding textual birthplace and occupation strings in the censuses of England and Wales and Scotland, 1851–1911. While the approaches for the two text strings are different, they are both based upon the integration of computer technologies, mathematical methods, and expert knowledge. Both processes are described formally using Structured Analysis and Design Technique methodology. The classification of occupations is defined by two algorithms based on statistical decision theory in order to allocate codes from the original occupation strings. The method of standardising parishes is based on the comparison of original birthplace strings and reference data.

[1]  Antonio Zamora,et al.  Automatic spelling correction in scientific and scholarly text , 1984, CACM.

[2]  Charles Eames,et al.  A computer perspective: background to the computer age (new ed.) , 1990 .

[3]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[4]  Toni Weller,et al.  The Information State in England: The Central Collection of Information on Citizens Since 1500 , 2005, J. Documentation.

[5]  Anthony Hyman,et al.  Herman Hollerith: Forgotten Giant of Information Processing , 1982 .

[6]  THE CENSUS OF ENGLAND AND WALES , 1921 .

[7]  E. Higgs Making Sense of the Census Revisited: Census Records for England and Wales 1801-1901: A Handbook for Historical Researchers , 2000 .

[8]  E. Higgs,et al.  The Integrated Census Microdata (I-CeM) Guide , 2013 .

[9]  Matthew Woollard,et al.  The classification of occupations in the 1881 census of England and Wales , 1998, Hist. Comput..

[10]  Sheldon Rothblatt,et al.  The Census and social structure : an interpretative guide to nineteenth century censuses for England and Wales , 1980 .

[11]  Kevin Schürer,et al.  Local communities in the Victorian census enumerators' books , 1996 .

[12]  W. Higgs The statistical big bang of 1911: ideology, technological innovation and the production of medical statistics. , 1996, Social history of medicine : the journal of the Society for the Social History of Medicine.

[13]  K Schurer The 1891 census and local population studies. , 1991, Local population studies.

[14]  L. J. Hommes The evaluation of business process modeling techniques , 2004 .

[15]  Margo J. Anderson,et al.  The American Census , 2020 .

[16]  Margo Anderson The American Census: A Social History , 2015 .

[17]  Jeffrey L. Whitten,et al.  Systems Analysis and Design Methods , 1986 .

[18]  David A. Marca,et al.  SADT: structured analysis and design technique , 1987 .

[19]  Brian Randell A computer perspective , 2011, INROADS.

[20]  Markus Ackermann,et al.  From Spelling Correction to Text Cleaning - Using Context Information , 2007, GfKl.

[21]  Conrad Taeuber,et al.  The American Census: A Social History , 1989 .

[22]  D. V. Glass,et al.  Nineteenth-Century Society. Essays in the Use of Quantitative Methods for the Study of Social Data. , 1973 .