Towards the Creation of a Robust Search Index for Digitalized Documents
暂无分享,去创建一个
The simultaneous support of electronic and paper-based document handling is a natural demand of current filing and document management systems. To support the better management of search and retrieval functions and to reduce the high costs of digitizing, the Department of Distributed
Systems of SZTAKI analysed the different kinds of error that emerged during the digitization process
of Hungarian documents, and examined how these errors affect the searchability of the digitized
items. For this reason, a testbed was set up that was suitable for the automatic analysis of digitized
texts in a large corpus, and the conclusions and statistics obtained from the analysis were employed
in the development of new content management products. The primary beneficiaries of these are
civil service and higher-education bodies.