IMPACT: working together to address the challenges involving mass digitization of historical printed text

Purpose – The purpose of this paper is to address the most urgent challenges that libraries face in the mass digitization of historical printed text: the unsatisfactory result of the conversion of scanned images to full featured electronic text by means of automated optical character recognition (OCR); the historical language barrier around 1850, caused by inadequacy of most existing lexica for historical language for OCR or post‐correction and a lack of institutional knowledge and expertise in libraries, museums and archives.Design/methodology/approach – In the EC‐funded project IMPACT (Improving Access to Text), seven libraries, six research institutes and two private sector companies across Europe work together to address the challenges by the development of OCR software and technologies which exceed the accurateness of current state‐of‐the‐art software significantly. The IMPACT solutions focus on the entire process of recognition after the document leaves the scanner: Image processing, OCR processing ...