Multimodal Interactive Pattern Recognition and Applications

This book presents a different approach to pattern recognition (PR) systems, in which users of a system are involved during the recognition process. This can help to avoid later errors and reduce the costs associated with post-processing. The book also examines a range of advanced multimodal interactions between the machine and the users, including handwriting, speech and gestures. Features: presents an introduction to the fundamental concepts and general PR approaches for multimodal interaction modeling and search (or inference); provides numerous examples and a helpful Glossary; discusses approaches for computer-assisted transcription of handwritten and spoken documents; examines systems for computer-assisted language translation, interactive text generation and parsing, relevance-based image retrieval, and interactive document layout analysis; reviews several full working prototypes of multimodal interactive PR applications, including live demonstrations that can be publicly accessed on the Internet.

[1]  John F. Canny,et al.  The Future of Human-Computer Interaction , 2006, ACM Queue.

[2]  Jean-Cédric Chappelier,et al.  Offline grammar-based recognition of handwritten sentences , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[5]  Vladimir Pavlovic,et al.  Special issue on vision for human-computer interaction , 2007, Comput. Vis. Image Underst..

[6]  Andrés Marzal,et al.  Computing the K Shortest Paths: A New Algorithm and an Experimental Comparison , 1999, WAE.

[7]  Robert L. Mercer,et al.  Automatic speech recognition in machine-aided translation , 1994, Comput. Speech Lang..

[8]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[9]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[10]  E. Vidal,et al.  Estimation of confidence measures for machine translation , 2007, MTSUMMIT.

[11]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[12]  Harry Nyquist Certain Topics in Telegraph Transmission Theory , 1928 .

[13]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[14]  Germán Sanchis-Trilles,et al.  Improving Interactive Machine Translation via Mouse Actions , 2008, EMNLP.

[15]  Francisco Casacuberta,et al.  Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures , 2010, ACL.

[16]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[17]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[18]  Alfons Juan-Císcar,et al.  The GERMANA Database , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[19]  Hermann Ney,et al.  Discriminative Training and Maximum Entropy Models for Statistical Machine Translation , 2002, ACL.

[20]  Frederick Jelinek,et al.  Recognition performance of a structured language model , 2000, EUROSPEECH.

[21]  Francisco Casacuberta,et al.  Adapting finite-state translation to the TransType2 project , 2003, EAMT.

[22]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[23]  Francesco Ricci,et al.  User Modeling, Adaptation, and Personalization , 2013, Lecture Notes in Computer Science.

[24]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[25]  George F. Foster,et al.  Adaptive Language and Translation Models for Interactive Machine Translation , 2004, EMNLP.

[26]  Philippe Langlais,et al.  Unit Completion for a Computer-aided Translation Typing System , 2000, ANLP.

[27]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[28]  Francisco Casacuberta,et al.  Finite-State Models for Computer Assisted Translation , 2004, ECAI.

[29]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[30]  Jeffrey Wollock 1. The Word , 1997 .

[31]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[32]  Robert A. Wagner,et al.  Order-n correction for regular languages , 1974, CACM.

[33]  Hermann Ney,et al.  Automatic text dictation in computer-assisted translation , 2005, INTERSPEECH.

[34]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[35]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[36]  Horst Bunke,et al.  Hidden Markov model-based ensemble methods for offline handwritten text line recognition , 2008, Pattern Recognit..

[37]  Bruce T. Lowerre,et al.  The HARPY speech recognition system , 1976 .

[38]  Geoffrey Leech,et al.  The tagged LOB Corpus : user's manual , 1986 .

[39]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[40]  Francisco Casacuberta,et al.  From Machine Translation to Computer Assisted Translation using Finite-State Models , 2004, EMNLP.

[41]  Steve Hanneke,et al.  Theoretical foundations of active learning , 2009 .

[42]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[43]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[44]  P. Isabelle,et al.  Phrase-based Machine Translation in a Computer-assisted Translation Environment , 2009, MTSUMMIT.

[45]  Gerhard Fischer,et al.  User Modeling in Human–Computer Interaction , 2001, User Modeling and User-Adapted Interaction.

[46]  Barry Haddow,et al.  Interactive Assistance to Human Translators using Statistical Machine Translation Methods , 2009, MTSUMMIT.

[47]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[48]  Hermann Ney,et al.  Efficient Search for Interactive Statistical Machine Translation , 2003, EACL.

[49]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[50]  Gernot A. Fink,et al.  Markov models for offline handwriting recognition: a survey , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[51]  Hermann Ney,et al.  Maximum entropy language modeling and the smoothing problem , 2000, IEEE Trans. Speech Audio Process..

[52]  Hermann Ney,et al.  Application of word-level confidence measures in interactive statistical machine translation , 2005, EAMT.

[53]  Hermann Ney,et al.  Extensions to the word graph method for large vocabulary continuous speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[54]  George F. Foster,et al.  User-Friendly Text Prediction For Translators , 2002, EMNLP.

[55]  Philipp Koehn,et al.  Further Meta-Evaluation of Machine Translation , 2008, WMT@ACL.

[56]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[57]  Alfons Juan-Císcar,et al.  Adaptation from partially supervised handwritten text transcriptions , 2009, ICMI-MLMI '09.

[58]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[59]  George F. Foster,et al.  Confidence estimation for translation prediction , 2003, CoNLL.

[60]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[61]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[62]  Alejandro Héctor Toselli,et al.  On-Line Handwriting Recognition System for Tamil Handwritten Characters , 2007, IbPRIA.

[63]  Wolfgang Macherey,et al.  Lattice-based Minimum Error Rate Training for Statistical Machine Translation , 2008, EMNLP.

[64]  Philipp Koehn,et al.  Findings of the 2010 Joint Workshop on Statistical Machine Translation and Metrics for Machine Translation , 2010, WMT@ACL.

[65]  Josef Kittler,et al.  Relaxation labelling algorithms - a review , 1986, Image Vis. Comput..

[66]  William J. Christmas,et al.  Structural Matching in Computer Vision Using Probabilistic Relaxation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[67]  Francisco Casacuberta,et al.  On the Use of Confidence Measures within an Interactive-predictive Machine Translation System , 2010, EAMT.

[68]  Sanjoy Dasgupta The Two Faces of Active Learning , 2009, Discovery Science.

[69]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[70]  Philippe Langlais,et al.  Prediction of Words in Statistical Machine Translation using a Multilayer Perceptron , 2009, MTSUMMIT.

[71]  Marc Dymetman,et al.  Towards an automatic dictation system for translators : the transtalk project , 1994, ICSLP.

[72]  Fadoua Drira,et al.  Towards restoring historic documents degraded over time , 2006, Second International Conference on Document Image Analysis for Libraries (DIAL'06).

[73]  Pierre Isabelle,et al.  Target-Text Mediated Interactive Machine Translation , 2004, Machine Translation.

[74]  Alfons Juan-Císcar,et al.  Confidence Measures for Error Correction in Interactive Transcription Handwritten Text , 2009, ICIAP.

[75]  Alexander H. Waibel,et al.  Online handwriting recognition: the NPen++ recognizer , 2001, International Journal on Document Analysis and Recognition.

[76]  Francisco Casacuberta,et al.  Computer Assisted Transcription of Text Images , 2011 .

[77]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[78]  Eric Moulines,et al.  On‐line expectation–maximization algorithm for latent data models , 2007, ArXiv.

[79]  M. Tahar Kechadi,et al.  Preprocessing Techniques for Online Handwriting Recognition , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[80]  Philippe Langlais,et al.  Trans Type: Development-Evaluation Cycles to Boost Translator's Productivity , 2002, Machine Translation.

[81]  Alfons Juan-Císcar,et al.  Balancing error and supervision effort in interactive-predictive handwriting recognition , 2010, IUI '10.

[82]  Salim Roukos,et al.  Maximum likelihood and discriminative training of direct translation models , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[83]  Richard C. Rose,et al.  Towards domain independence in machine aided human translation , 2008, INTERSPEECH.

[84]  Alfons Juan-Císcar,et al.  The RODRIGO Database , 2010, LREC.

[85]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[86]  C H Chen Frontiers of remote sensing information processing , 2003 .

[87]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[88]  Hermann Ney,et al.  Word-Level Confidence Estimation for Machine Translation , 2007, CL.

[89]  Francisco Casacuberta,et al.  A Syntactic Pattern Recognition Approach to Computer Assisted Translation , 2004, SSPR/SPR.

[90]  Richard C. Rose,et al.  Integration of ASR and machine translation models in a document translation task , 2007, INTERSPEECH.

[91]  Francisco Casacuberta,et al.  Computer-assisted translation using speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[92]  Alejandro Héctor Toselli,et al.  Multimodal interactive transcription of text images , 2010, Pattern Recognit..

[93]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription of Text Images and Multimodal Interaction , 2008, MLMI.

[94]  Hermann Ney,et al.  Some approaches to statistical and finite-state speech-to-speech translation , 2004, Comput. Speech Lang..

[95]  Paul A. Viola,et al.  Interactive Information Extraction with Constrained Conditional Random Fields , 2004, AAAI.

[96]  C. Fugen,et al.  Speech Recognition in Human Mediated Translation Scenarios , 2006, MELECON 2006 - 2006 IEEE Mediterranean Electrotechnical Conference.

[97]  F. Itakura,et al.  Balancing acoustic and linguistic probabilities , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[98]  Panayiotis Zaphiris,et al.  Cross-disciplinary Advances in Human Computer Interaction: User Modeling, Social Computing, and Adap , 2008 .

[99]  Richard C. Rose,et al.  Integration of Statistical Models for Dictation of Document Translations in a Machine-Aided Human Translation Task , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[100]  Alejandro Héctor Toselli,et al.  Computer Assisted Transcription of Handwritten Text Images , 2007 .

[101]  Frank Lebourgeois,et al.  DEBORA: Digital AccEss to BOoks of the RenAissance , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[102]  Richard M. Schwartz,et al.  An Omnifont Open-Vocabulary OCR System for English and Arabic , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[103]  Hermann Ney,et al.  Unsupervised training of acoustic models for large vocabulary continuous speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[104]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[105]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[106]  Lawrence O'Gorman,et al.  Document Image Analysis , 1996 .

[107]  Isabelle Guyon,et al.  UNIPEN project of on-line data exchange and recognizer benchmarks , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[108]  D. Ledbetter,et al.  Multicolor Spectral Karyotyping of Human Chromosomes , 1996, Science.

[109]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[110]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[111]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[112]  Character recognition experiments using Unipen data , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[113]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[114]  F. Casacuberta,et al.  Thot: a Toolkit To Train Phrase-based Statistical Translation Models , 2005, MTSUMMIT.

[115]  Roland Kuhn,et al.  French speech recognition in an automatic dictation system for translators: the transtalk project , 1995, EUROSPEECH.

[116]  Francisco Casacuberta,et al.  Statistical Phrase-Based Models for Interactive Computer-Assisted Translation , 2006, ACL.

[117]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[118]  Francisco Casacuberta,et al.  Human interaction for high-quality machine translation , 2009, CACM.

[119]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[120]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[121]  Erkki Oja,et al.  Speeding up on-line recognition of handwritten characters by pruning the prototype set , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.