PixelTone: a multimodal interface for image editing

Photo editing can be a challenging task, and it becomes even more difficult on the small, portable screens of mobile devices that are now frequently used to capture and edit images. To address this problem we present PixelTone, a multimodal photo editing interface that combines speech and direct manipulation. In this video, we demonstrate how our system uses natural language for expressing users' desired changes to an image. We also demonstrate how we combine natural language and touch gestures for creating named references and sketching to localize image operations to specific regions.

[1]  Deb Roy,et al.  Augmenting user interfaces with adaptive speech commands , 2003, ICMI '03.

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Raja Bala,et al.  Language-based color editing for mobile device , 2011, Electronic Imaging.

[4]  Hilary Johnson,et al.  Supporting creative work tasks: the potential of multimodal tools to support sketching , 1999, Creativity & Cognition.

[5]  Patrick Langdon,et al.  Developing accessible TV applications , 2011, ASSETS '11.

[6]  Brian Lathrop,et al.  Usability evaluation of a Volkswagen Group in-vehicle speech system , 2009, AutomotiveUI.

[7]  Geoff Woolfe Making Color Adjustment Accessible to Non-Experts Through the Use of Language , 2007, Color Imaging Conference.

[8]  Alexander G. Hauptmann,et al.  Speech and gestures for graphic image manipulation , 1989, CHI '89.

[9]  Rob Miller,et al.  Translating keyword commands into executable code , 2006, UIST.

[10]  P MarcusMitchell,et al.  Building a large annotated corpus of English , 1993 .

[11]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[12]  Michael S. Bernstein,et al.  Inky: a sloppy command line for the web with rich visual feedback , 2008, UIST '08.

[13]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[14]  Sharon Oviatt,et al.  Multimodal interactive maps: designing for human performance , 1997 .

[15]  Yasuhiro Katagiri,et al.  A Comparison of Graphics and Speech in a Task-Oriented Interactio , 2000, Diagrams.

[16]  Natasha Gelfand,et al.  Multi-exposure imaging on mobile devices , 2010, ACM Multimedia.

[17]  Tariq Samad,et al.  Towards a Natural Language Interface for CAD , 1985, DAC 1985.

[18]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[19]  Jeffrey Nichols,et al.  A conversational interface to web automation , 2010, UIST '10.

[20]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[21]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[22]  Yingen Xiong,et al.  Gradient Domain Image Blending and Implementation on Mobile Devices , 2009, MobiCASE.

[23]  A. D. Milota,et al.  Modality fusion for graphic design applications , 2004, ICMI '04.

[24]  R. Pausch An Empirical Study : Adding Voice Input to a Graphical Editor , 1991 .