Understanding People Relationship: Analysis of Digitised Historical Newspaper Articles

The study of historical persons and their relationships gives an insight into the lives of people and the way society functioned in early times. Such information concerning Australian history can be gleaned from Trove’s digitized collection of historical newspapers (1803–1954). This research aims to mine Trove’s articles using closed and maximal association rules mining along with visualization tools to discover, conceptualize and understand the type, size and complexity of the notable relationships that existed between persons in historical Australia. Before the articles could be mined, they needed vigorous cleaning. Given the data’s source, type and extraction methods, estimated word-error rates were at 50–75 %. Pre-processing efforts were aimed at reducing errors originating from optical character recognition (OCR), natural language processing and some co-referencing both within and between articles. Only after cleaning were the datasets able to return interesting associations at higher support thresholds.