Confusion network based Video OCR post-processing approach

The paper originally presents a confusion network based framework for Video OCR post-processing. The framework consists of four parts: selection of reference and hypotheses, construction of confusion network, decoding for final output, and a novel metric of quantitatively evaluating Video OCR post-processing approaches. By integrating both visual and textual information, we construct the character transition network to reduce the error rate for OCR outputs. The large-scale experimental results demonstrate that this approach can significantly improve the accuracy of Video OCR results with only little incremental time. Moreover, with comparison and the detailed analysis, we conclude that “Voting+2-gram” is the most applicable method for real application.

[1]  Hichem Sahbi,et al.  Consensus Network Decoding for Statistical Machine Translation System Combination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Jean-Luc Gauvain,et al.  Improved ROVER using Language Model Information , 2000 .

[3]  Kevin Knight,et al.  A Syntax-based Statistical Translation Model , 2001, ACL.

[4]  David S. Doermann,et al.  Text identification in noisy document images using Markov random model , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[5]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[6]  Richard M. Schwartz,et al.  Combining Outputs from Multiple Machine Translation Systems , 2007, NAACL.

[7]  Rainer Lienhart,et al.  Localizing and segmenting text in images and videos , 2002, IEEE Trans. Circuits Syst. Video Technol..

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[10]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..