The IQ of the Crowd: Understanding and Improving Information Quality in Structured User-Generated Content

User-generated content UGC is becoming a valuable organizational resource, as it is seen in many cases as a way to make more information available for analysis. To make effective use of UGC, it is necessary to understand information quality IQ in this setting. Traditional IQ research focuses on corporate data and views users as data consumers. However, as users with varying levels of expertise contribute information in an open setting, current conceptualizations of IQ break down. In particular, the practice of modeling information requirements in terms of fixed classes, such as an Entity-Relationship diagram or relational database tables, unnecessarily restricts the IQ of user-generated data sets. This paper defines crowd information quality crowd IQ, empirically examines implications of class-based modeling approaches for crowd IQ, and offers a path for improving crowd IQ using instance-and-attribute based modeling. To evaluate the impact of modeling decisions on IQ, we conducted three experiments. Results demonstrate that information accuracy depends on the classes used to model domains, with participants providing more accurate information when classifying phenomena at a more general level. In addition, we found greater overall accuracy when participants could provide free-form data compared to a condition in which they selected from constrained choices. We further demonstrate that, relative to attribute-based data collection, information loss occurs when class-based models are used. Our findings have significant implications for information quality, information modeling, and UGC research and practice.

[1]  L. S. B. Raccoon,et al.  A middle-out concept of hierarchy (or the problem of feeding the animals) , 1998, SOEN.

[2]  Jordan Raddick,et al.  Galaxy Zoo: Morphological Classification and Citizen Science , 2011, 1104.5513.

[3]  John Krumm,et al.  User-Generated Content , 2008, IEEE Pervasive Comput..

[4]  Edward J. Bedrick,et al.  [A Survey of Exact Inference for Contingency Tables]: Comment , 1992 .

[5]  Peter Aiken,et al.  Information systems development and data modeling: Conceptual and philosophical foundations , 1997 .

[6]  Harris Wu,et al.  Quality of data standards: framework and illustration using XBRL taxonomy and instances , 2011, Electron. Mark..

[7]  Andrew Gemino,et al.  Complexity and clarity in conceptual modeling: Comparison of mandatory and optional properties , 2005, Data Knowl. Eng..

[8]  Steve Kelling,et al.  Data-intensive science applied to broad-scale citizen science. , 2012, Trends in ecology & evolution.

[9]  Kalle Lyytinen,et al.  Information Systems Development and Data Modeling: Philosophical Foundations , 1995 .

[10]  L. Barsalou,et al.  Ad hoc categories , 1983, Memory & cognition.

[11]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[12]  Yair Wand,et al.  Theoretical foundations for conceptual modelling in information systems development , 1995, Decis. Support Syst..

[13]  Ian I. Mitroff,et al.  A Program for Research on Management Information Systems , 1973 .

[14]  Giri Kumar Tayi,et al.  Examining data quality , 1998, CACM.

[15]  N. Oliver,et al.  People power , 2014, Nature.

[16]  Diane M. Strong,et al.  Beyond Accuracy: What Data Quality Means to Data Consumers , 1996, J. Manag. Inf. Syst..

[17]  Ximena J. Nelson,et al.  The use of visual media as a tool for investigating animal behaviour , 2013, Animal Behaviour.

[18]  David N. Bonter,et al.  Citizen Science as an Ecological Research Tool: Challenges and Benefits , 2010 .

[19]  Kevin Crowston,et al.  From Conservation to Crowdsourcing: A Typology of Citizen Science , 2011, 2011 44th Hawaii International Conference on System Sciences.

[20]  Kalle Lyytinen,et al.  Information systems development and data modelling: conceptual and philosophical foundations , 1995 .

[21]  Katherine Gallagher,et al.  A Tale of Two Studies: Replicating ‘Advertising Effectiveness and Content Evaluation in Print and on the Web’ , 2001, Journal of Advertising Research.

[22]  Donald P. Ballou,et al.  Modeling Data and Process Quality in Multi-Input, Multi-Output Information Systems , 1985 .

[23]  Vladimir Zwass,et al.  Co-Creation: Toward a Taxonomy and an Integrated Research Perspective , 2010, Int. J. Electron. Commer..

[24]  Alon Y. Halevy,et al.  Crowdsourcing systems on the World-Wide Web , 2011, Commun. ACM.

[25]  Kevin Crowston,et al.  Mechanisms for Data Quality and Validation in Citizen Science , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.

[26]  Roman Lukyanenko,et al.  Citizen Science 2.0: Data Management Principles to Harness the Power of the Crowd , 2011, DESRIST.

[27]  Ron Weber,et al.  Research Commentary: Information Systems and Conceptual Modeling - A Research Agenda , 2002, Inf. Syst. Res..

[28]  Yair Wand,et al.  Using Cognitive Principles to Guide Classification in Information Systems Modeling , 2008, MIS Q..

[29]  Ann Majchrzak,et al.  Emergency! Web 2.0 to the rescue! , 2011, Commun. ACM.

[30]  Y. Wiersma Birding 2.0: Citizen Science and Effective Monitoring in the Web 2.0 World , 2010 .

[31]  Glenn J. Browne,et al.  Investigating Retrieval-Induced Forgetting During Information Requirements Determination , 2010, J. Assoc. Inf. Syst..

[32]  E. Hand,et al.  Citizen science: People power , 2010, Nature.

[33]  Frank D. Fincham,et al.  A Prototype Analysis of Gratitude: Varieties of Gratitude Experiences , 2009, Personality & social psychology bulletin.

[34]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[35]  Edward J. Wisniewski,et al.  Superordinate and basic category names in discourse: A textual analysis , 1989 .

[36]  Oded Nov,et al.  Technology-Mediated Citizen Science Participation: A Motivational Model , 2011, ICWSM.

[37]  Eben M. Haber,et al.  Creek watch: pairing usefulness and usability for successful citizen science , 2011, CHI.

[38]  Christine B. Williams,et al.  Web 2.0 and Politics: The 2008 U.S. Presidential Election and an E-Politics Research Agenda , 2010, MIS Q..

[39]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[40]  Shawndra Hill,et al.  Expert Stock Picker: The Wisdom of (Experts in) Crowds , 2011, Int. J. Electron. Commer..

[41]  Carol Reeves,et al.  DEFINING QUALITY: ALTERNATIVES AND IMPLICATIONS , 1994 .

[42]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[43]  S. M. Evans,et al.  The value of marine ecological data collected by volunteers , 2003 .

[44]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[45]  Yair Wand,et al.  Emancipating instances from the tyranny of classes in information modeling , 2000, TODS.

[46]  Salvatore T. March,et al.  A Research Note on Representing Part-Whole Relations in Conceptual Modeling , 2012, MIS Q..

[47]  Stephen M. Kosslyn,et al.  Pictures and names: Making the connection , 1984, Cognitive Psychology.

[48]  Paul Michael Di Gangi,et al.  Getting Customers' Ideas to Work for You: Learning from Dell how to Succeed with Online User Innovation Communities , 2010, MIS Q. Executive.

[49]  David Coleman,et al.  Volunteered Geographic Information: the nature and motivation of produsers , 2009, Int. J. Spatial Data Infrastructures Res..

[50]  M. Haklay How Good is Volunteered Geographical Information? A Comparative Study of OpenStreetMap and Ordnance Survey Datasets , 2010 .

[51]  S. S. Culbert,et al.  Cognition and Categorization , 1979 .

[52]  S. Rosenberg,et al.  A method for investigating and representing a person's implicit theory of personality: Theodore Dreiser's view of people. , 1972 .

[53]  John Gallaugher,et al.  Social Media and Customer Dialog Management at Starbucks , 2010, MIS Q. Executive.

[54]  Sue Holwell,et al.  Information, Systems and Information Systems: Making Sense of the Field , 1998 .

[55]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[56]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[57]  E. Ziegel Juran's Quality Control Handbook , 1988 .

[58]  Edward E. Smith,et al.  Basic-level superiority in picture categorization , 1982 .

[59]  A. Silman,et al.  Statistical methods for assessing observer variability in clinical measures. , 1992, BMJ.

[60]  M. McCloskey,et al.  Natural categories: Well defined or fuzzy sets? , 1978 .

[61]  S. Rosenberg,et al.  STRUCTURAL REPRESENTATIONS OF NATURALISTIC DESCRIPTIONS OF PERSONALITY. , 1974, Multivariate behavioral research.

[62]  Bryan A. Pendleton,et al.  Power of the Few vs. Wisdom of the Crowd: Wikipedia and the Rise of the Bourgeoisie , 2006 .

[63]  Graeme G. Shanks,et al.  Representing part-whole relations in conceptual modeling: an empirical evaluation , 2008 .

[64]  Antoni Olivé,et al.  Conceptual modeling of information systems , 2007 .

[65]  Lisa Norton,et al.  The role of 'Big Society' in monitoring the state of the natural environment. , 2011, Journal of environmental monitoring : JEM.

[66]  Michael I. Posner,et al.  Foundations of cognitive neuroscience , 1993 .

[67]  Boris Wyssusek,et al.  On Ontological Foundations of Conceptual Modelling , 2006, Scand. J. Inf. Syst..

[68]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[69]  Giancarlo Guizzardi,et al.  Theoretical foundations and engineering tools for building ontologies as reference conceptual models , 2010, Semantic Web.

[70]  Terry Winograd,et al.  Understanding computers and cognition - a new foundation for design , 1987 .

[71]  Richard Y. Wang,et al.  Data quality assessment , 2002, CACM.

[72]  Katherine Rowland Citizen science goes 'extreme' , 2012, Nature.

[73]  Richard Y. Wang,et al.  Journey to Data Quality , 2006 .

[74]  Ron Weber,et al.  Are Attributes Entities? A Study of Database Designers' Memory Structures , 1996, Inf. Syst. Res..

[75]  A. W. Galloway,et al.  The Reliability of Citizen Science: A Case Study of Oregon White Oak Stand Surveys , 2006 .

[76]  J. Tanaka,et al.  Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[77]  M. Posner Foundations of cognitive science , 1989 .

[78]  Donald P. Ballou,et al.  Designing Information Systems to Optimize the Accuracy-Timeliness Tradeoff , 1995, Inf. Syst. Res..

[79]  A. Agresti [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .

[80]  Jeffrey Parsons,et al.  An Information Model Based on Classification Theory , 1996 .

[81]  Marta Indulska,et al.  Do Ontological Deficiencies in Modeling Grammars Matter? , 2011, MIS Q..

[82]  M. Lynne Markus,et al.  Industry-Wide Information Systems Standardization as Collective Action: The Case of the U.S. Residential Mortgage Industry , 2006, MIS Q..

[83]  Julia K. Parrish,et al.  BYCATCH AND BEACHED BIRDS: ASSESSING MORTALITY IMPACTS IN COASTAL NET FISHERIES USING MARINE BIRD STRANDINGS , 2009 .

[84]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[85]  Yair Wand,et al.  Choosing classes in conceptual modeling , 1997, CACM.

[86]  Yong Tan,et al.  Social Networks and the Diffusion of User-Generated Content: Evidence from YouTube , 2012, Inf. Syst. Res..

[87]  Michael R. Vitale,et al.  Creating Competitive Advantage with Interorganizational Information Systems , 1988, MIS Q..

[88]  Caren B. Cooper,et al.  Data validation in citizen science: a case study from Project FeederWatch , 2012 .

[89]  T. Daugherty,et al.  Exploring Consumer Motivations for Creating User-Generated Content , 2008 .

[90]  John Mylopoulos,et al.  Information Modeling in the Time of the Revolution , 1998, Inf. Syst..

[91]  John van den Hoven Data Architecture: Principles for Data , 2003, Inf. Syst. Manag..

[92]  G. Murphy,et al.  The Big Book of Concepts , 2002 .

[93]  Thomas Redman,et al.  Data quality for the information age , 1996 .

[94]  Roman Lukyanenko,et al.  Easier citizen science is better , 2011, Nature.

[95]  Frederick H. Lochovsky,et al.  Data Models , 2008, Encyclopedia of GIS.

[96]  Ron Weber,et al.  An Ontological Model of an Information System , 1990, IEEE Trans. Software Eng..

[97]  Elisa Bertino,et al.  Objects with Multiple Most Specific Classes , 1995, ECOOP.

[98]  Yang W. Lee,et al.  Crafting Rules: Context-Reflective Data Quality Problem Solving , 2003, J. Manag. Inf. Syst..

[99]  Richard Y. Wang,et al.  A product perspective on total data quality management , 1998, CACM.

[100]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[101]  Diane M. Strong,et al.  Process-Embedded Data Integrity , 2004, J. Database Manag..

[102]  Ashish K. Jha,et al.  Are doctors created equal ? An investigation of online ratings by patients , 2010 .

[103]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[104]  D. Cruse The pragmatics of lexical specificity , 1977, Journal of Linguistics.

[105]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[106]  Jonathan Silvertown,et al.  Taxonomy: include social networking , 2010, Nature.

[107]  Miriam J. Metzger,et al.  The credibility of volunteered geographic information , 2008 .

[108]  Guillaume Touya,et al.  Quality Assessment of the French OpenStreetMap Dataset , 2010, Trans. GIS.

[109]  Oded Nov,et al.  Information Quality in Wikipedia: The Effects of Group Composition and Task Conflict , 2011, J. Manag. Inf. Syst..

[110]  Peter Meso,et al.  Conceptualizing Systems for Understanding: An Empirical Test of Decomposition Principles in Object-Oriented Analysis , 2006, Inf. Syst. Res..

[111]  Vivek Choudhury,et al.  Strategic Choices in the Development of Interorganizational Information Systems , 1997, Inf. Syst. Res..