Building Hierarchies of Concepts via Crowdsourcing

Hierarchies of concepts are useful in many applications from navigation to organization of objects. Usually, a hierarchy is created in a centralized manner by employing a group of domain experts, a time-consuming and expensive process. The experts often design one single hierarchy to best explain the semantic relationships among the concepts, and ignore the natural uncertainty that may exist in the process. In this paper, we propose a crowdsourcing system to build a hierarchy and furthermore capture the underlying uncertainty. Our system maintains a distribution over possible hierarchies and actively selects questions to ask using an information gain criterion. We evaluate our methodology on simulated data and on a set of real world application domains. Experimental results show that our system is robust to noise, efficient in picking questions, cost-effective, and builds high quality hierarchies.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  W. T. Tutte Graph Theory , 1984 .

[3]  Kevin Knight,et al.  Building a Large Ontology for Machine Translation , 1993, HLT.

[4]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[5]  David Bruce Wilson,et al.  Generating random spanning trees more quickly than the cover time , 1996, STOC '96.

[6]  Alexander Budanitsky,et al.  Lexical Semantic Relatedness and Its Application in Natural Language Processing , 1999 .

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Steffen Staab,et al.  An Ontology-based Framework for Text Mining , 2005, LDV Forum.

[10]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[11]  Khurshid Ahmad,et al.  The head-modifier principle and multilingual term extraction , 2005, Natural Language Engineering.

[12]  Melvil Dewey,et al.  A Classification and Subject Index for Cataloguing and Arranging the Books and Pamphlets of a Library , 2006 .

[13]  Pietro Perona,et al.  Unsupervised learning of visual taxonomies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[17]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[18]  Dieter Fox,et al.  A Scalable Tree-Based Approach for Joint Object and Pose Recognition , 2011, AAAI.

[19]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[20]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[21]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[22]  Mark A. Musen,et al.  Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow , 2013, WebSci.

[23]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mark A. Musen,et al.  Developing Crowdsourced Ontology Engineering Tasks: An iterative process , 2013, CrowdSem.

[25]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[26]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.