论文信息 - Crowd synthesis: extracting categories and clusters from complex data

Crowd synthesis: extracting categories and clusters from complex data

Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cognitively taxing, and automated techniques show mixed success. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cognition, but incurs other challenges including limited attention and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major challenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that provides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.

[1] Gordon H. Bower,et al. Learning and Applying Category Knowledge in Unsupervised Domains , 1991 .

[2] Jeffrey Heer,et al. Interpretation and trust: designing model-driven visualizations for text analysis , 2012, CHI.

[3] Ivan Beschastnikh,et al. Articulations of wikiwork: uncovering valued work in wikipedia through barnstars , 2008, CSCW.

[4] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[5] Pietro Perona,et al. Crowdclustering , 2011, NIPS.

[6] K. Holyoak,et al. Schema induction and analogical transfer , 1983, Cognitive Psychology.

[7] R. Kolbe,et al. Content-Analysis Research: An Examination of Applications with Directives for Improving Research Reliability and Objectivity , 1991 .

[8] Stephen E. Robertson,et al. Rethinking the ESP game , 2009, CHI Extended Abstracts.

[9] J. Tanaka,et al. Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[10] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[11] Bruce Thompson,et al. Planned versus Unplanned and Orthogonal versus Nonorthogonal Contrasts: The Neo-Classical Perspective. , 1990 .

[12] A. Young. Sorting Things Out: Classification and Its Consequences. , 2001 .

[13] Robert E. Kraut,et al. Gender, topic, and audience response: an analysis of user-generated content on facebook , 2013, CHI.

[14] L. Suchman. Do categories have politics? The language/action perspective reconsidered , 1993 .

[15] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16] Lydia B. Chilton,et al. Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[17] David A. Forsyth,et al. Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18] Luis von Ahn,et al. Human Computation for Attribute and Attribute Value Acquisition , 2011 .

[19] Sergio Gómez,et al. Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms , 2006, J. Classif..

[20] Stuart K. Card,et al. The cost structure of sensemaking , 1993, INTERCHI.

[21] Wayne D. Gray,et al. Basic objects in natural categories , 1976, Cognitive Psychology.

[22] Jinfeng Yi,et al. Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach , 2012, HCOMP@AAAI.

[23] K. Holyoak,et al. Induction of category distributions: a framework for classification learning. , 1984, Journal of experimental psychology. Learning, memory, and cognition.

[24] D L Medin,et al. Presentation order and recognition of categorically related examples , 1994, Psychonomic bulletin & review.

[25] David M. Blei,et al. Visualizing Topic Models , 2012, ICWSM.

[26] Jeffrey Heer,et al. Strategies for crowdsourcing social data analysis , 2012, CHI.

[27] Pat Langley,et al. Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[28] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[29] Patrick Shafto,et al. Development of categorization and reasoning in the natural world: novices to experts, naive similarity to ecological knowledge. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[30] A. Treisman,et al. A feature-integration theory of attention , 1980, Cognitive Psychology.

[31] Paul J. Feltovich,et al. Categorization and Representation of Physics Problems by Experts and Novices , 1981, Cogn. Sci..

[32] Lydia B. Chilton,et al. Community Clustering: Leveraging an Academic Crowd to Form Coherent Conference Sessions , 2013, HCOMP.

[33] Chris North,et al. Semantic interaction for visual text analytics , 2012, CHI.

[34] Michael S. Bernstein,et al. Who gives a tweet?: evaluating microblog content value , 2012, CSCW.

[35] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[36] G. Murphy,et al. Induction and category coherence , 1996, Psychonomic bulletin & review.

[37] Jeffrey Heer,et al. Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[38] Roger B. Dannenberg,et al. TagATune: A Game for Music and Sound Annotation , 2007, ISMIR.

[39] Harold Ossher,et al. Guest Editors' Introduction: Studying Professional Software Design , 2012, IEEE Softw..

[40] Adam Tauman Kalai,et al. Adaptively Learning the Crowd Kernel , 2011, ICML.

[41] Douglas L. Medin,et al. Context theory of classification learning. , 1978 .