Collaborative multimodal photo annotation over digital paper

The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.

[1]  Beat Signer,et al.  Fundamental concepts for interactive paper and cross-media information spaces , 2017 .

[2]  Edward C. Kaiser,et al.  Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations , 2006, ICMI '06.

[3]  Sharon L. Oviatt,et al.  Quiet interfaces that help students think , 2006, UIST.

[4]  Scott R. Klemmer,et al.  ButterflyNet: a mobile capture and access system for field biology research , 2006, CHI.

[5]  David R. McGee,et al.  Demo : Collaborative Multimodal Photo Annotation over Digital Paper , 2006 .

[6]  Paulo Barthelmess,et al.  Edge-splitting in a cumulative multimodal system, for a no-wait temporal threshold on information fusion, combined with an under-specified display , 2006, INTERSPEECH.

[7]  Xiao Huang,et al.  Distributed pointing for multimodal collaboration over sketched diagrams , 2005, ICMI '05.

[8]  Mor Naaman,et al.  Leveraging context to resolve identity in photo albums , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[9]  Loe M. G. Feijs,et al.  Exploring the potentials of combining photo annotating tasks with instant messaging fun , 2004, MUM '04.

[10]  Richard J. Anderson,et al.  A study of digital ink in lecture presentation , 2004, CHI.

[11]  Ramesh C. Jain,et al.  eVitae: An Event-Based Electronic Chronicle , 2004, EDBT.

[12]  Philip R. Cohen,et al.  Tangible multimodal interfaces for safety-critical applications , 2004, CACM.

[13]  Marc Davis,et al.  From “ What ? ” to “ Why ? ” : The Social Uses of Personal Photos , 2004 .

[14]  Rich Gossweiler,et al.  Enabling Informal Communication of Digital Stories , 2004 .

[15]  Beat Signer,et al.  Digital annotation of printed documents , 2003, CIKM '03.

[16]  Kerry Rodden,et al.  How do people manage their digital photographs? , 2003, CHI '03.

[17]  Frédéric Vernier,et al.  Personal digital historian: story sharing around the table , 2003, INTR.

[18]  Tele Tan,et al.  An Improved Method for Image Retrieval Using Speech Annotation , 2003, MMM.

[19]  Allan Kuchinsky,et al.  Requirements for photoware , 2002, CSCW '02.

[20]  Henry Lieberman,et al.  Aria: an agent for annotating and retrieving images , 2001, Computer.

[21]  Mary Czerwinski,et al.  Semi-Automatic Image Annotation , 2001, INTERACT.

[22]  Philip R. Cohen,et al.  Creating tangible interfaces by augmenting physical objects with multimodal language , 2001, IUI '01.

[23]  Rohini K. Srihari,et al.  Show&Tell: A Semi-Automated Image Annotation System , 2000, IEEE Multim..

[24]  David Pye,et al.  SHOEBOX: A DIGITAL PHOTO MANAGEMENT SYSTEM , 2000 .

[25]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[26]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[27]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[28]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[29]  B. Shneiderman,et al.  Motivating Annotation for Personal Digital Photo Libraries : Lowering Barriers While Raising Incentives , 2022 .