A Multimodal Home Entertainment Interface via a Mobile Device

We describe a multimodal dialogue system for interacting with a home entertainment center via a mobile device. In our working prototype, users may utilize both a graphical and speech user interface to search TV listings, record and play television programs, and listen to music. The developed framework is quite generic, potentially supporting a wide variety of applications, as we demonstrate by integrating a weather forecast application. In the prototype, the mobile device serves as the locus of interaction, providing both a small touchscreen display, and speech input and output; while the TV screen features a larger, richer GUI. The system architecture is agnostic to the location of the natural language processing components: a consistent user experience is maintained regardless of whether they run on a remote server or on the device itself.

[1]  Stephanie Seneff,et al.  Response planning and generation in the MERCURY flight reservation system , 2002, Comput. Speech Lang..

[2]  Kentaro Toyama,et al.  Toward universal mobile interaction for shared displays , 2004, CSCW.

[3]  Stephanie Seneff,et al.  Immersive second language acquisition in narrow domains: a prototype ISLAND dialogue system , 2007, SLaTE.

[4]  Aseel Berglund,et al.  Using speech and dialogue for interactive TV navigation , 2004, Universal Access in the Information Society.

[5]  Stephanie Seneff,et al.  Automatic induction of n-gram language models from a natural language grammar , 2003, INTERSPEECH.

[6]  Jürgen te Vrugt,et al.  Smartkom-home - an advanced multi-modal interface to home entertainment , 2003, INTERSPEECH.

[7]  Anthony Vetro,et al.  The prospects for unrestricted speech input for TV content search , 2006, AVI '06.

[8]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[9]  Umar Saif,et al.  Reducing configuration overhead with goal-oriented programming , 2006, Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06).

[10]  Stephanie Seneff,et al.  TINA: A Natural Language System for Spoken Language Applications , 1992, Comput. Linguistics.

[11]  Stephanie Seneff,et al.  GENESIS-II: a versatile system for language generation in conversational system applications , 2000, INTERSPEECH.

[12]  Stephanie Seneff,et al.  A dynamic vocabulary spoken dialogue interface , 2004, INTERSPEECH.

[13]  Jeffrey Nichols,et al.  Controlling Home and Office Appliances with Smart Phones , 2006, IEEE Pervasive Computing.

[14]  I. Lee Hetherington,et al.  A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition , 2005, INTERSPEECH.

[15]  Robert Tappan Morris,et al.  Persistent personal names for globally connected mobile devices , 2006, OSDI '06.

[16]  Armando Fox,et al.  The Interactive Workspaces Project: Experiences with Ubiquitous Computing Rooms , 2002, IEEE Pervasive Comput..

[17]  Umar Saif,et al.  Structured decomposition of adaptive applications , 2008, Pervasive Mob. Comput..

[18]  Stephanie Seneff Reversible Sound-to-Letter/Letter-to-Sound Modeling Based on Syllable Structure , 2007, HLT-NAACL.

[19]  Takashi Tsuzuki,et al.  A new digital TV interface employing speech recognition , 2003, IEEE Trans. Consumer Electron..

[20]  Geoffrey Zweig,et al.  Live search for mobile:Web services by voice on the cellphone , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Joseph Polifroni,et al.  Towards the automatic generation of mixed-initiative dialogue systems from web content , 2003, INTERSPEECH.

[22]  I. Lee Hetherington,et al.  PocketSUMMIT: small-footprint continuous speech recognition , 2007, INTERSPEECH.

[23]  Stephanie Seneff,et al.  A context resolution server for the galaxy conversational systems , 2003, INTERSPEECH.

[24]  Bernard Renger,et al.  A Multimodal Interface for Access to Content in the Home , 2007, ACL.

[25]  BerglundAseel,et al.  Using speech and dialogue for interactive TV navigation , 2004 .

[26]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[27]  Stephanie Seneff,et al.  Scalable and portable web-based multimodal dialogue interaction with geographical databases , 2006, INTERSPEECH.

[28]  Hyo-Jung Oh,et al.  An Intelligent TV interface based on Statistical Dialogue Management , 2007, IEEE Transactions on Consumer Electronics.

[29]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..