Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing

The transcription of digitalised documents is useful to ease the digital access to their contents. Natural language technologies, such as Automatic Speech Recognition (ASR) for speech audio signals and Handwritten Text Recognition (HTR) for text images, have become common tools for assisting transcribers, by providing a draft transcription from the digital document that they may amend. This draft is useful when it presents an error rate low enough to make the amending process more comfortable than a complete transcription from scratch. The work described in this thesis is focused on the improvement of the transcription offered by an HTR system from three scenarios: multimodality, interactivity and crowdsourcing. The image transcription can be obtained by dictating their textual contents to an ASR system. Besides, when both sources of information (image and speech) are available, a multimodal combination is possible, and this can be used to provide assistive systems with additional sources of information. Moreover, speech dictation can be used in a multimodal crowdsourcing platform, where collaborators may provide their speech by using mobile devices. Different solutions for each scenario were tested on two Spanish historical manuscripts, obtaining statistically significant improvements

[1]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Alejandro Héctor Toselli Rossi,et al.  Multimodal Interactive Handwritten Text Transcription , 2012, Series in Machine Perception and Artificial Intelligence.

[3]  Francisco Casacuberta,et al.  Computer Assisted Transcription of Speech , 2007, IbPRIA.

[4]  Carlos D. Martínez-Hinarejos,et al.  Collaborator Effort Optimisation in Multimodal Crowdsourcing for Transcribing Historical Manuscripts , 2016, IberSPEECH.

[5]  Thad Hughes,et al.  Building transcribed speech corpora quickly and cheaply for many languages , 2010, INTERSPEECH.

[6]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Chalapathy Neti,et al.  Automatic speechreading of impaired speech , 2001, AVSP.

[8]  Carlos D. Martínez-Hinarejos,et al.  Comparing Different Feedback Modalities in Assisted Transcription of Manuscripts , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[9]  Carlos D. Martínez-Hinarejos,et al.  Combining handwriting and speech recognition for transcribing historical handwritten documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[10]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[11]  Emilio Granell Romero Advances on the Transcription of Historical Manuscripts based on Multimodality, Interactivity and Crowdsourcing , 2018 .

[12]  Alejandro Héctor Toselli,et al.  Using Mouse Feedback in Computer Assisted Transcription of Handwritten Text Images , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[13]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[15]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[16]  Enric Monte-Moreno,et al.  Optimization of speech parameter weighting for CDHMM word recognition , 1995, EUROSPEECH.

[17]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[18]  Carlos D. Martínez-Hinarejos,et al.  An Interactive Approach with Off-Line and On-Line Handwritten Text Recognition Combination for Transcribing Historical Documents , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[19]  Alfons Juan-Císcar,et al.  The RODRIGO Database , 2010, LREC.

[20]  Carlos D. Martínez-Hinarejos,et al.  Multimodal Output Combination for Transcribing Historical Handwritten Documents , 2015, CAIP.

[21]  Carlos D. Martínez-Hinarejos,et al.  A Multimodal Crowdsourcing Framework for Transcribing Historical Handwritten Documents , 2016, DocEng.

[22]  Verónica Romero,et al.  Multimodality, interactivity, and crowdsourcing for document transcription , 2018, Comput. Intell..

[23]  Jian Xue,et al.  Improved confusion network algorithm and shortest path search from word lattice , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Moisés Pastor,et al.  iATROS: A SPEECH AND HANDWRITING RECOGNITION SYSTEM , 2008 .

[25]  Verónica Romero Computer Assisted Transcription of Text Images , 2011 .

[26]  Antonio L. Lagarda,et al.  An iterative multimodal framework for the transcription of handwritten historical documents , 2014, Pattern Recognit. Lett..

[27]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[28]  M. F. Potter,et al.  The History of Bed Bug Management— With Lessons from the Past , 2011 .

[29]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[30]  Verónica Romero,et al.  Interactive Off-Line Handwritten Text Transcription Using On-Line Handwritten Text as Feedback , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[31]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[32]  Hermann Ney,et al.  Bootstrap estimates for confidence intervals in ASR performance evaluation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Antonio L. Lagarda,et al.  A Multimodal Approach to Dictation of Handwritten Historical Documents , 2011, INTERSPEECH.

[34]  Carlos D. Martínez-Hinarejos,et al.  Multimodal Crowdsourcing for Transcribing Handwritten Documents , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[35]  Volkmar Frinken,et al.  Handwriting recognition in historical documents using very large vocabularies , 2013, HIP '13.

[36]  Pankaj Jain,et al.  A Survey on Offline Handwriting Recognition Systems , 2016 .

[37]  David Rubin,et al.  Introduction to Continuum Mechanics , 2009 .

[38]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[39]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[40]  Windy Dryden,et al.  The multimodal approach , 2016 .

[41]  Alicia Fornés,et al.  A bimodal crowdsourcing platform for demographic historical manuscripts , 2014, DATeCH '14.

[42]  Ian R. Lane,et al.  Tools for Collecting Speech Corpora via Mechanical-Turk , 2010, Mturk@HLT-NAACL.