Fusion strategies for speech and handwriting modalities in HCI

In this paper we present a strategy for handling of multimodal signals from pen-based mobile devices for Human to Computer Interaction (HCI), where our focus is on the modalities of spoken and handwritten inputs. Each modality for itself is quite well understood, as the exhaustive literature demonstrates, although still a number of challenges exist, like recognition result improvements. Among the potentials in multimodal HCI are improvements in recognition and robustness as well as seamless men-machine communication based on fusion of different modalities by exploiting redundancies among these modalities. However, such valuable fusion of both modalities still poses some problems. Open problems today include design approaches for fusion strategies and with the increasing number of mobile and pen-based computers, particularly techniques for fusion of handwriting and speech appear to have a great potential. But today few publications can be found that addresses this potential. In this work we introduce a conceptional approach based on a model to describe a bimodal HCI process. We analyze four exemplary applications with respect to the structure of this model, and highlight the open problems within these applications. Further, we will outline possible solutions to these challenges. Having such fusion model for HCI may simplify the development of seamless and intuitive to user interfaces on pen-based mobile devices. For one of our application scenarios, a bimodal system for form data recording and recognition in medical or financial environment, we will present some first experimental results.

[1]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[2]  Anil K. Jain,et al.  Indexing and retrieval of on-line handwritten documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Yu.A. Zuev,et al.  The voting as a way to increase the decision reliability , 1999 .

[4]  Jana Dittmann,et al.  Using adapted Levenshtein distance for on-line signature authentication , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Stéphane H. Maes,et al.  An instantiable speech biometrics module with natural language interface: implementation in the telephony environment , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Giovanni Seni,et al.  Online handwriting recognition in a form-filling task: evaluating the impact of context-awareness , 2003, IS&T/SPIE Electronic Imaging.

[7]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[8]  Dov Dori,et al.  Syntactic and Semantic Graphics Recognition: The Role of the Object-Process Methodology , 1999, GREC.

[9]  Arun Ross,et al.  Multimodal biometrics: An overview , 2004, 2004 12th European Signal Processing Conference.

[10]  Robert Sabourin,et al.  Integration of contextual information in handwriting recognition systems , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[11]  James A. Landay,et al.  Making Sharing Pervasive: Ubiquitous Computing for Shared Note Taking , 1999, IBM Syst. J..

[12]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Réjean Plamondon,et al.  Automatic signature verification and writer identification - the state of the art , 1989, Pattern Recognit..

[14]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[15]  Lambert Schomaker,et al.  Sparse-parametric writer identification using heterogeneous feature groups , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[16]  Hervé Bourlard,et al.  Speech recognition with auxiliary information , 2004, IEEE Transactions on Speech and Audio Processing.