Cascade: crowdsourcing taxonomy creation

Taxonomies are a useful and ubiquitous way of organizing information. However, creating organizational hierarchies is difficult because the process requires a global understanding of the objects to be categorized. Usually one is created by an individual or a small group of people working together for hours or even days. Unfortunately, this centralized approach does not work well for the large, quickly changing datasets found on the web. Cascade is an automated workflow that allows crowd workers to spend as little at 20 seconds each while collectively making a taxonomy. We evaluate Cascade and show that on three datasets its quality is 80-90% of that of experts. Cascade has a competitive cost to expert information architects, despite taking six times more human labor. Fortunately, this labor can be parallelized such that Cascade will run in as fast as four minutes instead of hours or days.

[1]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Walter S. Lasecki,et al.  Online Sequence Alignment for Real-Time Audio Transcription by Non-Experts , 2012, AAAI.

[4]  Peng Dai,et al.  Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.

[5]  Peter Triantafillou,et al.  Crowdsourcing Taxonomies , 2012, ESWC.

[6]  Aniket Kittur,et al.  CrowdWeaver: visually managing complex crowd work , 2012, CSCW.

[7]  Claude Ghaoui,et al.  Encyclopedia of Human Computer Interaction , 2005 .

[8]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[9]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[10]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[11]  Jinfeng Yi,et al.  Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach , 2012, HCOMP@AAAI.

[12]  Marco Colombetti,et al.  Using WordNet to turn a Folksonomy into a Hierarchy of Concepts , 2007, SWAP.

[13]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[14]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[15]  Alan Borning,et al.  Supporting reflective public thought with considerit , 2012, CSCW.

[16]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[17]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[18]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[19]  Pietro Perona,et al.  Crowdclustering , 2011, NIPS.

[20]  Krzysztof Z. Gajos,et al.  Human computation tasks with global constraints , 2012, CHI.

[21]  Luis von Ahn,et al.  Human Computation for Attribute and Attribute Value Acquisition , 2011 .

[22]  Björn Hartmann,et al.  CommunitySourcing: engaging local crowds to perform expert work via physical kiosks , 2012, CHI.