Confidence Measures for Error Correction in Interactive Transcription Handwritten Text

An effective approach to transcribe old text documents is to follow an interactive-predictive paradigm in which both, the system is guided by the human supervisor, and the supervisor is assisted by the system to complete the transcription task as efficiently as possible. In this paper, we focus on a particular system prototype called GIDOC, which can be seen as a first attempt to provide user-friendly, integrated support for interactive-predictive page layout analysis, text line detection and handwritten text transcription. More specifically, we focus on the handwriting recognition part of GIDOC, for which we propose the use of confidence measures to guide the human supervisor in locating possible system errors and deciding how to proceed. Empirical results are reported on two datasets showing that a word error rate not larger than a 10% can be achieved by only checking the 32% of words that are recognised with less confidence.

[1]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[2]  Hermann Ney,et al.  Integrated Handwriting Recognition And Interpretation Using Finite-State Models , 2004, Int. J. Pattern Recognit. Artif. Intell..

[3]  Horst Bunke,et al.  Rejection strategies for offline handwritten text line recognition , 2006, Pattern Recognit. Lett..

[4]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[5]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[6]  José Alberto Sanchis Navarro Estimación y aplicación de medidas de confianza en reconocimiento automático del habla , 2004 .

[7]  Horst Bunke,et al.  Hidden Markov model-based ensemble methods for offline handwritten text line recognition , 2008, Pattern Recognit..

[8]  Frank Lebourgeois,et al.  DEBORA: Digital AccEss to BOoks of the RenAissance , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[9]  Alfons Juan-Císcar,et al.  The GERMANA Database , 2009, 2009 10th International Conference on Document Analysis and Recognition.