CRICTO: Supporting Sensemaking through Crowdsourced Information Schematization

We present CRICTO, a new crowdsourcing visual analytics environment for making sense of and analyzing text data, whereby multiple crowdworkers are able to parallelize the simple information schematization tasks of relating and connecting entities across documents. The diverse links from these schematization tasks are then automatically combined and the system visualizes them based on the semantic types of the linkages. CRICTO also includes several tools that allow analysts to interactively explore and refine crowdworkers’ results to better support their own sensemaking processes. We evaluated CRICTO’s techniques and analysis workflow with deployments of CRICTO using Amazon Mechanical Turk and a user study that assess the effect of crowdsourced schematization in sensemaking tasks. The results of our evaluation show that CRICTO’s crowdsourcing approaches and workflow help analysts explore diverse aspects of datasets, and uncover more accurate hidden stories embedded in the text datasets.

[1]  Iryna Gurevych,et al.  WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.

[2]  Michael S. Bernstein,et al.  Context Trees: Crowdsourcing Global Understanding from Local Views , 2014, HCOMP.

[3]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[4]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[5]  Martin Wattenberg,et al.  Voyagers and voyeurs: supporting asynchronous collaborative information visualization , 2007, CHI.

[6]  Christopher Andrews,et al.  VizCept: Supporting synchronous collaboration for constructing visualizations in intelligence analysis , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[7]  Stuart K. Card,et al.  Entity-based collaboration tools for intelligence analysis , 2008, 2008 IEEE Symposium on Visual Analytics Science and Technology.

[8]  I. V. Ramakrishnan,et al.  Toward a Multi-Analyst, Collaborative Framework for Visual Analytics , 2006, 2006 IEEE Symposium On Visual Analytics Science And Technology.

[9]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[10]  Felix Stalder,et al.  Open Source Intelligence , 2002, First Monday.

[11]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[12]  Jeffrey Heer,et al.  Design Considerations for Collaborative Visual Analytics , 2008, Inf. Vis..

[13]  Jeffrey Heer,et al.  CommentSpace: structured support for collaborative visual analysis , 2011, CHI.

[14]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[15]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[16]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[17]  Aniket Kittur,et al.  Pitfalls of information access with visualizations in remote collaborative analysis , 2010, CSCW '10.

[18]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[19]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[20]  Juan Enrique Ramos,et al.  Using TF-IDF to Determine Word Relevance in Document Queries , 2003 .

[21]  Mary Czerwinski,et al.  Co-Located Collaborative Visual Analytics around a Tabletop Display , 2012, IEEE Transactions on Visualization and Computer Graphics.

[22]  John T. Stasko,et al.  Evaluating visual analytics systems for investigative analysis: Deriving design principles from a case study , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[23]  Aniket Kittur,et al.  Crowdlines: Supporting Synthesis of Diverse Information Sources through Crowdsourced Outlines , 2015, HCOMP.

[24]  Christopher Andrews,et al.  Space to think: large high-resolution displays for sensemaking , 2010, CHI.

[25]  Maneesh Agrawala,et al.  Extracting references between text and charts via crowdsourcing , 2014, CHI.

[26]  Ed Huai-hsin Chi,et al.  Entity Workspace: An Evidence File That Aids Memory, Inference, and Reading , 2006, ISI.

[27]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[28]  George Chin,et al.  Exploring the analytical processes of intelligence analysts , 2009, CHI.

[29]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[30]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[31]  John Stasko,et al.  Jigsaw: supporting investigative analysis through interactive visualization , 2008 .

[32]  Björn Hartmann,et al.  Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis , 2013, IEEE Transactions on Visualization and Computer Graphics.

[33]  Jeffrey Heer,et al.  Strategies for crowdsourcing social data analysis , 2012, CHI.

[34]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[35]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[36]  Jean Scholtz,et al.  Evaluating Visual Analytics at the 2007 VAST Symposium Contest , 2008, IEEE Computer Graphics and Applications.

[37]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[38]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[39]  Oren Etzioni,et al.  Open Language Learning for Information Extraction , 2012, EMNLP.

[40]  Chris North,et al.  Semantic interaction for visual text analytics , 2012, CHI.

[41]  Melanie Tory,et al.  Supporting Communication and Coordination in Collaborative Sensemaking , 2014, IEEE Transactions on Visualization and Computer Graphics.

[42]  Daniel J. Wigdor,et al.  Conductor: enabling and understanding cross-device interaction , 2014, CHI.

[43]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.