论文信息 - Combination of multiple aligned recognition outputs using WFST and LSTM

Combination of multiple aligned recognition outputs using WFST and LSTM

The contribution of this paper is a new strategy of integrating multiple recognition outputs of diverse recognizers. Such an integration can give higher performance and more accurate outputs than a single recognition system. The problem of aligning various Optical Character Recognition (OCR) results lies in the difficulties to find the correspondence on character, word, line, and page level. These difficulties arise from segmentation and recognition errors which are produced by the OCRs. Therefore, alignment techniques are required for synchronizing the outputs in order to compare them. Most existing approaches fail when the same error occurs in the multiple OCRs. If the corrections do not appear in one of the OCR approaches are unable to improve the results. We design a Line-to-Page alignment with edit rules using Weighted Finite-State Transducers (WFST). These edit rules are based on edit operations: insertion, deletion, and substitution. Therefore, an approach is designed using Recurrent Neural Networks with Long Short-Term Memory (LSTM) to predict these types of errors. A Character-Epsilon alignment is designed to normalize the size of the strings for the LSTM alignment. The LSTM returns best voting, especially when the heuristic approaches are unable to vote among various OCR engines. LSTM predicts the correct characters, even if the OCR could not produce the characters in the outputs. The approaches are evaluated on OCR's output from the UWIII and historical German Fraktur dataset which are obtained from state-of-the-art OCR systems. The experiments shows that the error rate of the LSTM approach has the best performance with around 0.40%, while other approaches are between 1.26% and 2.31%.

Marcus Liwicki | Thomas M. Breuel | Mayce Ibrahim Ali Al Azawi

[1] R. Manmatha,et al. Creating an Improved Version Using Noisy OCR from Multiple Editions , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[2] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[3] Fakhri Karray,et al. Enhancement of the ROVER's Voting Scheme Using Pattern Matching , 2012, AIS.

[4] Thomas M. Breuel,et al. Normalizing historical orthography for OCR historical documents using LSTM , 2013, HIP '13.

[5] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[6] Eric K. Ringger,et al. Progressive Alignment and Discriminative Error Correction for Multiple OCR Engines , 2011, 2011 International Conference on Document Analysis and Recognition.

[7] David Bamman,et al. Improving OCR Accuracy for Classical Critical Editions , 2009, ECDL.

[8] Daniel P. Lopresti,et al. Using Consensus Sequence Voting to Correct OCR Errors , 1997, Comput. Vis. Image Underst..

[9] Marcus Liwicki,et al. Character-Level Alignment Using WFST and LSTM for Post-processing in Multi-script Recognition Systems - A Comparative Study , 2014, ICIAR.

[10] Marcus Liwicki,et al. WFST-based ground truth alignment for difficult historical documents with text modification and layout variations , 2013, Electronic Imaging.