TagSense: Leveraging Smartphones for Automatic Image Tagging

Mobile phones are becoming the convergent platform for personal sensing, computing, and communication. This paper attempts to exploit this convergence toward the problem of automatic image tagging. We envision TagSense, a mobile phone-based collaborative system that senses the people, activity, and context in a picture, and merges them carefully to create tags on-the-fly. The main challenge pertains to discriminating phone users that are in the picture from those that are not. We deploy a prototype of TagSense on eight Android phones, and demonstrate its effectiveness through 200 pictures, taken in various social settings. While research in face recognition continues to improve image tagging, TagSense is an attempt to embrace additional dimensions of sensing toward this end goal. Performance comparison with Apple iPhoto and Google Picasa shows that such an out-of-band approach is valuable, especially with increasing device density and greater sophistication in sensing and learning algorithms.

[1]  Gernot Heiser,et al.  An Analysis of Power Consumption in a Smartphone , 2010, USENIX Annual Technical Conference.

[2]  Oskar Juhlin,et al.  Mobile collaborative live video mixing , 2008, Mobile HCI.

[3]  Tiziana D'Orazio,et al.  Complex human activity recognition for monitoring wide outdoor environments , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[5]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[6]  Roy Want,et al.  When Cell Phones Become Computers , 2009, IEEE Pervasive Comput..

[7]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[8]  Guobin Shen,et al.  A BeepBeep ranging system on mobile phones , 2007, SenSys '07.

[9]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[10]  Marc Davis,et al.  Metadata creation system for mobile images , 2004, MobiSys '04.

[11]  Toyoaki Nishida,et al.  Neary: conversation field detection based on similarity of auditory situation , 2009, HotMobile '09.

[12]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[13]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[14]  Qiang Yang,et al.  Real world activity recognition with multiple goals , 2008, UbiComp.

[15]  Ashutosh Sabharwal,et al.  Directional antenna diversity for mobile devices: characterizations and solutions , 2010, MobiCom.

[16]  Romit Roy Choudhury,et al.  SurroundSense: mobile phone localization via ambience fingerprinting , 2009, MobiCom '09.

[17]  Gregory D. Abowd,et al.  The ContextCam: Automated Point of Capture Video Annotation , 2004, UbiComp.

[18]  Ramachandran Ramjee,et al.  Nericell: rich monitoring of road and traffic conditions using mobile smartphones , 2008, SenSys '08.

[19]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[20]  Vikas Kumar,et al.  CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[21]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .