A scalable, distributed and dynamic workflow system for digitization processes
暂无分享,去创建一个
Creating digital representations of ancient manuscripts, prints and maps is a challenging task due to the sources' fragile and heterogeneous natures. Digitization requires a very specialized set of scanning hardware in order to cover the sources' diversity. The central task is obtaining the maximum reproduction quality while minimizing the error rate, which is difficult to achieve due to the large amounts of image data resulting from digitization, putting huge computational loads on image processing modules, error-detection and information retrieval heuristics. As digital copies initially do not contain any information about their sources' semantics, additional efforts have to be made to extract semantic metadata. This is an error-prone, time-consuming manual process, which calls for automated mechanisms to support the user. This paper introduces a decentralized, event-driven workflow system designed to overcome the above mentioned challenges. It leverages dynamic routing between workflow components, thus being able to quickly adapt to the sources' unique requirements. It provides a scalable approach to soften out high computational loads on single units by using distributed computing and provides modules for automated image pre-/post-processing, error-detection heuristics, data mining, semantic analysis, metadata augmentation, quality assurance and an export functionality to established publishing platforms or long-term storage facilites.
[1] Hendrik Schöneberg,et al. Context Vector Classification - Term Classification with Context Evaluation , 2010, KDIR.
[2] Joe Armstrong,et al. Making reliable distributed systems in the presence of software errors , 2003 .
[3] Frank Müller,et al. Contextual Approaches for Identification of Toponyms in Ancient Documents , 2012, KDIR.