InterPoll: Crowd-Sourced Internet Polls (Done Right)

Crowd-sourcing is increasingly being used for providing answers to online polls and surveys. However, existing systems, while taking care of the mechanics of attracting crowd workers, poll building, and payment, provide little that would help the survey-maker or pollster to obtain statistically signicant results devoid of even the obvious selection biases. This paper proposes InterPoll, a platform for programming of crowd-sourced polls. Polls are expressed as embedded LINQ queries, whose results are provided to the developer. InterPoll supports reasoning about uncertainty, enabling t-tests, etc. on random variables obtained from the crowd. InterPoll performs query optimization, as well as bias correction and power analysis, among other features. Making InterPoll queries part of the surrounding program allows for optimizations that take advantage of the surrounding code context. The goal of InterPoll is to provide a system that can be reliably used for research into marketing, social and political science questions. This paper highlights some of the existing challenges and how InterPoll is designed to address most of them. We outline some of the optimizations and give numerous motivating examples designed to illustrate our system design. Note that this paper is an outline of our vision | we deliberately focus on examples and motivation and leave a detailed technical treatment for future work.

[1]  D. Phillips,et al.  Some Effects of "Social Desirability" in Survey Studies , 1972, American Journal of Sociology.

[2]  A. Tversky,et al.  The framing of decisions and the psychology of choice. , 1981, Science.

[3]  Floyd J. Fowler,et al.  Survey Research Methods , 1984 .

[4]  Eun Sul Lee,et al.  Analyzing Complex Survey Data , 1989 .

[5]  Phil Edwards,et al.  Survey Errors and Survey Costs , 1991 .

[6]  Christopher Winship,et al.  Sampling Weights and Regression Analysis , 1994 .

[7]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[8]  F. Vella Estimating Models with Sample Selection Bias: A Survey , 1998 .

[9]  Robert D. Tortora,et al.  Principles for Constructing Web Surveys , 1998 .

[10]  Jeremy C. Wyatt,et al.  When to Use Web-based Surveys , 2000, J. Am. Medical Informatics Assoc..

[11]  M. Couper A REVIEW OF ISSUES AND APPROACHES , 2000 .

[12]  S. Sills,et al.  Innovations in Survey Research , 2002 .

[13]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[14]  Holly Gunn,et al.  Web-based Surveys: Changing the Survey Process , 2002, First Monday.

[15]  J. Wyatt,et al.  Using the Internet for Surveys and Health Research , 2002, Journal of medical Internet research.

[16]  J. Hanley,et al.  Statistical analysis of correlated data using generalized estimating equations: an orientation. , 2003, American journal of epidemiology.

[17]  Shannon K. Gilmartin,et al.  Assessing Response Rates and Nonresponse Bias in Web and Paper Surveys , 2003 .

[18]  Jennifer Preece,et al.  Electronic Survey Methodology: A Case Study in Reaching Hard-to-Involve Internet Users , 2003, Int. J. Hum. Comput. Interact..

[19]  Scott B. MacKenzie,et al.  Common method biases in behavioral research: a critical review of the literature and recommended remedies. , 2003, The Journal of applied psychology.

[20]  P. Kellner,et al.  Can Online Polls Produce Accurate Findings? , 2004 .

[21]  M. Banaji,et al.  Psychological. , 2015, The journals of gerontology. Series B, Psychological sciences and social sciences.

[22]  Thomas Lumley,et al.  Analysis of Complex Survey Samples , 2004 .

[23]  S. Gosling,et al.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. , 2004, The American psychologist.

[24]  B. Duffy,et al.  Comparing Data from Online and Face-to-face Surveys , 2005 .

[25]  Scott Fricker,et al.  An Experimental Comparison of Web and Telephone Surveys , 2005 .

[26]  Kevin B. Wright,et al.  Researching Internet-Based Populations: Advantages and Disadvantages of Online Survey Research, Online Questionnaire Authoring Software Packages, and Web Survey Services , 2006, J. Comput. Mediat. Commun..

[27]  Joel R. Evans,et al.  The value of online surveys , 2005, Internet Res..

[28]  S. Nusser Survey Methodology. Robert M. Groves, Floyd J. Fowler Jr., Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau , 2006 .

[29]  S. Keeter The Impact of Cell Phone Noncoverage Bias on Polling in the 2004 Presidential Election , 2006 .

[30]  Nick Sparrow Developing Reliable Online Polls , 2006 .

[31]  Sunghee Lee Propensity score adjustment as a weighting scheme for volunteer panel web surveys , 2006 .

[32]  Marek Fuchs Mobile Web Surveys: A Preliminary Discussion of Methodological Implications , 2007 .

[33]  F. Bourguignon,et al.  Selection Bias Corrections Based on the Multinomial Logit Model: Monte Carlo Comparisons , 2007 .

[34]  David M Erceg-Hurn,et al.  Modern robust statistical methods: an easy way to maximize the accuracy and power of your research. , 2008, The American psychologist.

[35]  Mario Callegaro,et al.  Computing Response Metrics for Online Panels , 2008 .

[36]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[37]  Boi Faltings,et al.  Incentives for expressing opinions in online polls , 2008, EC '08.

[38]  Joe Mayo LINQ Programming , 2008 .

[39]  G. Loosveldt,et al.  An evaluation of the weighting procedures for an online access panel survey , 2008 .

[40]  Daniel M. Oppenheimer,et al.  Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power , 2009 .

[41]  Christopher Ré,et al.  Probabilistic databases: diamonds in the dirt , 2009, CACM.

[42]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[43]  Matthias Schonlau,et al.  Selection Bias in Web Surveys and the Use of Propensity Scores , 2006 .

[44]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[45]  Sunghee Lee,et al.  Estimation for Volunteer Panel Web Surveys Using Propensity Score Adjustment and Calibration Adjustment , 2009 .

[46]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[47]  M. Duda,et al.  The Fallacy of Online Surveys: No Data Are Better Than Bad Data , 2010 .

[48]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[49]  Frank McSherry Privacy integrated queries , 2009, Commun. ACM.

[50]  Tomasz Marek Kowalski,et al.  Optimization of Object-Oriented Queries through Rewriting Compound Weakly Dependent Subqueries , 2010, DEXA.

[51]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[52]  Robert Kosara,et al.  Do Mechanical Turks dream of square pie charts? , 2010, BELIV '10.

[53]  J. Kruschke What to believe: Bayesian methods for data analysis , 2010, Trends in Cognitive Sciences.

[54]  Dana Chandler,et al.  Preventing Satisficing in Online Surveys: A "Kapcha" to Ensure Higher Quality Data , 2010 .

[55]  J. Kruschke Doing Bayesian Data Analysis: A Tutorial with R and BUGS , 2010 .

[56]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[57]  Torsten Grust,et al.  Avalanche-safe LINQ compilation , 2010, Proc. VLDB Endow..

[58]  Cindy D. Kam,et al.  Risk Orientations and Policy Frames , 2010, The Journal of Politics.

[59]  Danielle E. Ramo,et al.  Reaching young adult smokers through the internet: comparison of three recruitment mechanisms. , 2010, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[60]  David M. McCord,et al.  Evaluating the College Sophomore Problem: The Case of Personality and Politics , 2010, The Journal of psychology.

[61]  Benjamin Nevarez Inside the SQL Server Query Optimizer , 2011 .

[62]  John K. Kruschke,et al.  Introduction to Special Section on Bayesian Data Analysis , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[63]  Laura B. Stephenson,et al.  Studying Political Behavior: A Comparison of Internet and Telephone Surveys , 2011 .

[64]  Danielle E. Ramo,et al.  Reliability and validity of self-reported smoking in an anonymous online survey with young adults. , 2011, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[65]  Björn Hartmann,et al.  Turkomatic: automatic recursive task and workflow design for mechanical turk , 2011, Human Computation.

[66]  D. Yeager,et al.  Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-Probability Samples , 2011 .

[67]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[68]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[69]  Michael I. Jordan,et al.  Bayesian Bias Mitigation for Crowdsourcing , 2011, NIPS.

[70]  Michael Isard,et al.  Steno: automatic optimization of declarative queries , 2011, PLDI '11.

[71]  Jon Sprouse A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory , 2010, Behavior research methods.

[72]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[73]  Tara S. Behrend,et al.  The viability of crowdsourcing for survey research , 2011, Behavior research methods.

[74]  K. Bretonnel Cohen,et al.  Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine? , 2011, CL.

[75]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[76]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[77]  Richard Valliant,et al.  Estimating Propensity Adjustments for Volunteer Web Surveys , 2011 .

[78]  Aniket Kittur,et al.  CrowdWeaver: visually managing complex crowd work , 2012, CSCW.

[79]  S. Fienberg,et al.  Current Population Survey , 2012 .

[80]  Patrick Minder,et al.  CrowdManager - Combinatorial Allocation and Pricing of Crowdsourcing Tasks with Time Constraints , 2012, EC 2012.

[81]  M. Sinclair,et al.  Comparison of response rates and cost-effectiveness for a community-based survey: postal, internet and telephone modes with generic or personalised recruitment approaches , 2012, BMC Medical Research Methodology.

[82]  Samuel J. Best,et al.  Exit Polls: Surveying the American Electorate, 1972-2010 , 2012 .

[83]  David R. Karger,et al.  Counting with the Crowd , 2012, Proc. VLDB Endow..

[84]  Swati Tawalare,et al.  Query Optimization to Improve Performance of the Code Execution , 2012 .

[85]  Aaron D. Shaw,et al.  Social desirability bias and self-reports of motivation: a study of amazon mechanical turk in the US and India , 2012, CHI.

[86]  AutoMan: a platform for integrating human-based and digital computation , 2012, OOPSLA '12.

[87]  Schahram Dustdar,et al.  Programming Hybrid Services in the Cloud , 2012, ICSOC.

[88]  Geraldine Fitzpatrick,et al.  Understanding heart rate sharing: towards unpacking physiosocial space , 2012, CHI.

[89]  Trent D. Buskirk,et al.  Online Surveys Aren't Just for Computers Anymore! Exploring Potential Mode Effects between Smartphone and Computer-Based Online Surveys , 2012 .

[90]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[91]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[92]  M. Swan Scaling crowdsourced health studies: the emergence of a new form of contract research organization. , 2012, Personalized medicine.

[93]  Adam J. Berinsky,et al.  Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.

[94]  M. Swan Crowdsourced Health Research Studies: An Important Emerging Complement to Clinical Trials in the Public Health Research Ecosystem , 2012, Journal of medical Internet research.

[95]  Roger Tourangeau,et al.  The Science of Web Surveys , 2013 .

[96]  Ryen W. White Beliefs and biases in web search , 2013, SIGIR.

[97]  Reynold Cheng,et al.  Optimizing plurality for human intelligence tasks , 2013, CIKM.

[98]  James Cheney,et al.  Edinburgh Research Explorer A Practical Theory of Language-integrated Query , 2022 .

[99]  Andreas Behrend,et al.  Stream fusion using reactive programming, LINQ and magic updates , 2013, Proceedings of the 16th International Conference on Information Fusion.

[100]  Thomas Weigert,et al.  An Approach for Optimization of Object Queries on Collections Using Annotations , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[101]  J. Kruschke Bayesian estimation supersedes the t test. , 2013, Journal of experimental psychology. General.

[102]  Chenglei Yang,et al.  What? How? Where? A Survey of Crowdsourcing , 2014 .

[103]  Claudio V. Russo,et al.  Tabular: a schema-driven probabilistic programming language , 2014, POPL.

[104]  Kathryn S. McKinley,et al.  Uncertain: a first-order type for uncertain data , 2014, ASPLOS.

[105]  Matthew J. Salganik,et al.  Wiki surveys : Open and quantifiable social data collection ∗ , 2012 .