A comprehensive human computation framework: with application to image labeling

Image and video labeling is important for computers to understand images and videos and for image and video search. Manual labeling is tedious and costly. Automatically image and video labeling is yet a dream. In this paper, we adopt a Web 2.0 approach to labeling images and videos efficiently: Internet users around the world are mobilized to apply their "common sense" to solve problems that are hard for today's computers, such as labeling images and videos. We first propose a general human computation framework that binds problem providers, Web sites, and Internet users together to solve large-scale common sense problems efficiently and economically. The framework addresses the technical challenges such as preventing a malicious party from attacking others, removing answers from bots, and distilling human answers to produce high-quality solutions to the problems. The framework is then applied to labeling images. Three incremental refinement stages are applied. The first stage collects candidate labels of objects in an image. The second stage refines the candidate labels using multiple choices. Synonymic labels are also correlated in this stage. To prevent bots and lazy humans from selecting all the choices, trap labels are generated automatically and intermixed with the candidate labels. Semantic distance is used to ensure that the selected trap labels would be different enough from the candidate labels so that no human users would mistakenly select the trap labels. The last stage is to ask users to locate an object given a label from a segmented image. The experimental results are also reported in this paper. They indicate that our proposed schemes can successfully remove spurious answers from bots and distill human answers to produce high-quality image labels.

[1]  Manuel Blum,et al.  Verbosity: a game for collecting common-sense facts , 2006, CHI.

[2]  A. Rosen,et al.  Silicon as a Millimeter-Wave Monolithically Integrated Substrate - A New Look , 1981 .

[3]  Karl Sims,et al.  Artificial evolution for computer graphics , 1991, SIGGRAPH.

[4]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[5]  Manuel Blum,et al.  Improving accessibility of the web with a computer game , 2006, CHI.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Luis von Ahn,et al.  Human computation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[8]  Rainee N. Simons,et al.  High-Temperature Superconducting Coplanar Waveguide Microwave Circuits and Antennas , 1993 .

[9]  Paul A. Boxer,et al.  Towards Learning Naive Physics by Visual Observation: Qualitative Spatial Representations , 2001, Australian Joint Conference on Artificial Intelligence.

[10]  Xian-Sheng Hua,et al.  Online multi-label active annotation: towards large-scale content-based video search , 2008, ACM Multimedia.

[11]  Helen J. Wang,et al.  Protection and communication abstractions for web browsers in MashupOS , 2007, SOSP.

[12]  Helen J. Wang,et al.  Subspace: secure cross-domain communication for web mashups , 2007, WWW '07.

[13]  Alex Kosorukoff,et al.  Human based genetic algorithm , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[14]  John Langford,et al.  CAPTCHA: Using Hard AI Problems for Security , 2003, EUROCRYPT.

[15]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  Craig Gentry,et al.  Secure distributed human computation , 2005, EC '05.

[17]  Allen Silver,et al.  Beta , 1975, The SAGE Encyclopedia of Research Design.

[18]  Manuel Blum,et al.  Peekaboom: a game for locating objects in images , 2006, CHI.

[19]  Luis von Ahn Games with a Purpose , 2006, Computer.

[20]  Rui Guo,et al.  Compoweb: a component-oriented web architecture , 2008, WWW.