CensusIRL: Historical census data preparation with MDD support

Census returns are a critical source of information for governments globally. They underpin a wide spectrum of public planning including health, housing, work and education. Historically, census forms have captured names, places, dates, age, occupation, family structure, and religion. In more recent times, sexual orientation and ethnicity, queries that can be intrusive to vulnerable communities, have been added to the criteria, and for such reasons data security is of paramount importance. Most governments restrict access to individual census returns, presenting the data in aggregate report format. The Irish government is particularly strict, enforcing a statutory closure period of 100 years. An exception was made for the Irish 1911 census which were digitised and released for free online consultation in 2009 [1]. They are an excellent source for genealogists and historians alike but exist as separate digital siloes. This project uses an eXtreme Model-Driven Development (XMDD) environment to create linkages between both datasets. It will discuss the development process of the CensusIrl application and the process used in developing the matching algorithm used. We will discuss the census records and the data cleansing process used in creating the initial proof of concept application. We detail the different approaches to the development life-cycle of the application and describe the different utilises used in the sanitation of data points in the records and the match-making process.

[1]  C. Breathnach,et al.  Evolution of the Historian Data Entry Application: Supporting Transcribathons in the Digital Humanities through MDD , 2022, 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC).

[2]  Mark Conrad,et al.  Computational Archival Science is a Two-Way Street , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[3]  R. Gnanasekaran,et al.  Using Transfer Learning to contextually Optimize Optical Character Recognition (OCR) output and perform new Feature Extraction on a digitized cultural and historical dataset , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[4]  G. Alter,et al.  Re-introducing the Cambridge Group Family Reconstitutions , 2020 .

[5]  C. Breathnach,et al.  Census: on paper, by governments, is still best , 2020, Nature.

[6]  T. Margaria,et al.  eXtreme Model-Driven Development (XMDD) Technologies as a Hands-On Approach to Software Development Without Coding , 2019, Encyclopedia of Education and Information Technologies.

[7]  Alexander Schieweck,et al.  The Digital Thread in Industry 4.0 , 2019, IFM.

[8]  James J. Feigenbaum,et al.  Automated Linking of Historical Data , 2019, Journal of Economic Literature.

[9]  Martha Bailey,et al.  How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data , 2017, Journal of economic literature.

[10]  Tiziana Margaria,et al.  Constraints-Driven Automatic Geospatial Service Composition: Workflows for the Analysis of Sea-Level Rise Impacts , 2016, ICCSA.

[11]  James J. Feigenbaum,et al.  Automated Census Record Linking: A Machine Learning Approach , 2016 .

[12]  Kees Mandemakers,et al.  The Intermediate Data Structure (IDS) for Longitudinal Historical Microdata, version 4 , 2014, Historical life course studies.

[13]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[14]  Tiziana Margaria,et al.  Agile IT: Thinking in User-Centric Models , 2008, ISoLA.

[15]  Alexander Sczyrba,et al.  GeneFisher-P: variations of GeneFisher as processes in Bio-jETI , 2008, BMC Bioinformatics.

[16]  Tiziana Margaria,et al.  Model-based design of distributed collaborative bioinformatics processes in the jABC , 2006, 11th IEEE International Conference on Engineering of Complex Computer Systems (ICECCS'06).

[17]  David O. Holmes,et al.  Improving precision and recall for Soundex retrieval , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[18]  Tiziana Margaria,et al.  Incremental Requirement Specification for Evolving Systems , 2001, Nord. J. Comput..

[19]  Timothy G. Walch Book Review: The Vanishing Irish: Households, Migration, and the Rural Economy in Ireland, 1850–1914 , 1999 .

[20]  Tiziana Margaria,et al.  A Constraint-Oriented Service Creation Environment , 1996, TACAS.

[21]  T. Guinnane Age at Leaving Home in Rural Ireland, 1901–1911 , 1992, The Journal of Economic History.

[22]  J. Budd,et al.  Intentional age-misreporting, age-heaping, and the 1908 Old Age Pensions Act in Ireland. , 1991, Population studies.

[23]  R. Breen Naming Practices in Western Ireland , 1982 .

[24]  P. Gibbon,et al.  The Stem Family in Ireland , 1978, Comparative Studies in Society and History.

[25]  Hafiz Ahmad Awais Chaudhary,et al.  Model-Driven Engineering in Digital Thread Platforms: A Practical Use Case and Future Challenges , 2022, ISoLA.

[26]  Maurice H. ter Beek,et al.  From Software Engineering to Formal Methods and Tools, and Back: Essays Dedicated to Stefania Gnesi on the Occasion of Her 65th Birthday , 2019, From Software Engineering to Formal Methods and Tools, and Back.

[27]  Tiziana Margaria,et al.  Language-Driven Engineering: From General-Purpose to Purpose-Specific Languages , 2019, Computing and Software Science.

[28]  Johannes Neubauer,et al.  MODEL DRIVEN DESIGN OF SECURE HIGH ASSURANCE SYSTEMS : AN INTRODUCTION TO THE OPEN PLATFORM FROM THE USER PERSPECTIVE , 2016 .

[29]  Tiziana Margaria,et al.  Agile Workflows for Climate Impact Risk Assessment based on the ci:grasp Platform and the jABC Modeling Framework , 2014 .