HOTPAPER: multimedia interaction with paper using mobile phones

The popularity of camera phones enables many exciting multimedia applications. In this paper, we present a novel technology and several applications that allow users to interact with paper documents, books, and magazines. This interaction is in the form of reading and writing electronic information, such as images, web urls, video, and audio, to the paper medium by pointing a camera phone at a patch of text on a document. Our application does not require any special markings, barcodes, or watermarks on the paper document. Instead, we propose a document recognition algorithm that automatically determines the location of a patch of text in a large collection of document images given a small document image. This is very challenging because the majority of phone cameras lack autofocus and macro capabilities and they produce low quality images and video. We developed a novel algorithm, Brick Wall Coding (BWC), that performs image-based document recognition using the mobile phone video frames. Given a document patch image, BWC utilizes the layout, i.e. relative locations, of word boxes in order to determine the original file, page, and the location on the page. BWC runs real-time (4 frames per second) on a Treo 700w smartphone with a 312 MHz processor and 64MB RAM. Using our method we can recognize blurry document patch frames that contain as little as 4-5 lines of text and a video resolution as low as 176x144. We performed experiments by indexing 4397 document pages and querying this database with 533 document patches. Besides describing the basic algorithm, this paper also describes several applications that are enabled by mobile phone-paper interaction, such as inserting electronic annotations to paper, using paper as a tangible interface to collect and communicate multimedia data, and collaborative homework.

[1]  David L. Hecht,et al.  Printed Embedded Data Graphical User Interfaces , 2001, Computer.

[2]  Berna Erol,et al.  Paper-Based Augmented Reality , 2007, 17th International Conference on Artificial Reality and Telexistence (ICAT 2007).

[3]  Ron Kimmel,et al.  Efficient Dilation, Erosion, Opening and Closing Algorithms , 2000, ISMM.

[4]  Dieter Schmalstieg,et al.  Experiences with Handheld Augmented Reality , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[5]  Fred Stentiford,et al.  Using context and similarity for face and location identification , 2006, Electronic Imaging.

[6]  Scott R. Klemmer,et al.  Books with Voices: Paper Transcripts as a Tangible Interface to Oral Histories , 2002 .

[7]  Qiong Liu,et al.  Mobile camera supported document redirection , 2006, MM '06.

[8]  Shumin Zhai,et al.  Camera phone based motion sensing: interaction techniques, applications and performance study , 2006, UIST.

[9]  Scott R. Klemmer,et al.  ButterflyNet: a mobile capture and access system for field biology research , 2006, CHI.

[10]  Berna Erol,et al.  The video paper multimedia playback system , 2003, MULTIMEDIA '03.

[11]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[12]  Masakazu Iwamura,et al.  Use of Affine Invariants in Locally Likely Arrangement Hashing for Camera-Based Document Image Retrieval , 2006, Document Analysis Systems.

[13]  Natasha Gelfand,et al.  Efficient Extraction of Robust Image Features on Mobile Devices , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[14]  Scott R. Klemmer,et al.  Books with voices: paper transcripts as a physical interface to oral histories , 2003, CHI '03.

[15]  Vidya Setlur,et al.  Mobile camera-based adaptive viewing , 2005, MUM '05.