Assignment Techniques for Crowdsourcing Sensitive Tasks

Protecting the privacy of crowd workers has been an important topic in crowdsourcing, however, task privacy has largely been ignored despite the fact that many tasks, e.g., form digitization, live audio transcription or image tagging often contain sensitive information. Although assigning an entire job to a worker may leak private information, jobs can often be split into small components that individually do not. We study the problem of distributing such tasks to workers with the goal of maximizing task privacy using such an approach. We introduce information loss functions to formally measure the amount of private information leaked as a function of the task assignment. We then design assignment mechanisms for three different assignment settings: PUSH, PULL and a new setting Tug Of War (TOW), which is an intermediate approach that balances flexibility for both workers and requesters. Our assignment algorithms have zero privacy loss for PUSH, and tight theoretical guarantees for PULL. For TOW, our assignment algorithm provably outperforms PULL; importantly the privacy loss is independent of the number of tasks, even when workers collude. We further analyze the performance and privacy tradeoffs empirically on simulated and real-world collusion networks and find that our algorithms outperform the theoretical guarantees.

[1]  Jaime G. Carbonell,et al.  Detecting Non-Adversarial Collusion in Crowdsourcing , 2014, HCOMP.

[2]  Ehud D. Karnin,et al.  Crowdsourcing in the Document Processing Practice - (A Short Practitioner/Visionary Paper) , 2010, ICWE Workshops.

[3]  Ashwin Machanavajjhala,et al.  A SPARSI: Partitioning Sensitive Data amongst Multiple Adversaries , 2013, Proc. VLDB Endow..

[4]  Edward M. Riseman,et al.  Indexing handwriting using word matching , 1996, DL '96.

[5]  Gaston H. Gonnet,et al.  Expected Length of the Longest Probe Sequence in Hash Code Searching , 1981, JACM.

[6]  Yu-An Sun,et al.  Human OCR: Insights from a Complex Human Computation Process , 2011 .

[7]  Schahram Dustdar,et al.  QoS-Based Task Scheduling in Crowdsourcing Environments , 2011, ICSOC.

[8]  Joseph M. Hellerstein,et al.  Shreddr: pipelined paper digitization for low-resource organizations , 2012, ACM DEV '12.

[9]  Shourya Roy,et al.  Form digitization in BPO: from outsourcing to crowdsourcing? , 2013, CHI.

[10]  Walter S. Lasecki,et al.  Glance Privacy: Obfuscating Personal Identity While Coding Behavioral Video , 2014, HCOMP.

[11]  Michael S. Bernstein,et al.  Mechanical Turk is Not Anonymous , 2013 .

[12]  Lakshminarayanan Subramanian,et al.  Reputation-based Worker Filtering in Crowdsourcing , 2014, NIPS.

[13]  Ted S. Sindlinger,et al.  Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business , 2010 .

[14]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[15]  Jaime Teevan,et al.  Information extraction and manipulation threats in crowd-powered systems , 2014, CSCW.

[16]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[17]  Matthew Lease,et al.  Look before you leap: Legal pitfalls of crowdsourcing , 2011, ASIST.

[18]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[19]  Gagan Goel,et al.  Allocating tasks to workers with matching constraints: truthful mechanisms for crowdsourcing markets , 2014, WWW.

[20]  Yves Lecourtier,et al.  Defining writer's invariants to adapt the recognition task , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[21]  D. Blumenthal Launching HITECH. , 2010, The New England journal of medicine.

[22]  Nicole Vincent,et al.  Writer Identification in Handwritten Documents , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[23]  Anthony K. H. Tung,et al.  K-Anonymity for Crowdsourcing Database , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  Bo Brinkman An Analysis of Student Privacy Rights in the Use of Plagiarism Detection Systems , 2013, Sci. Eng. Ethics.

[25]  Hisashi Kashima,et al.  Instance-Privacy Preserving Crowdsourcing , 2014, HCOMP.

[26]  Louis Vuurpijl,et al.  Coarse writing-style clustering based on simple stroke-related features. , 1996 .

[27]  A. Schuchat DEPARTMENT OF HEALTH & HUMAN SERVICES , 2015 .

[28]  Lav R. Varshney,et al.  Privacy and Reliability in Crowdsourcing Service Delivery , 2012, 2012 Annual SRII Global Conference.

[29]  Jeff Howe,et al.  Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business , 2008, Human Resource Management International Digest.

[30]  Pramod K. Varshney,et al.  Assuring privacy and reliability in crowdsourcing with coding , 2014, 2014 Information Theory and Applications Workshop (ITA).

[31]  Matthias Thimm,et al.  Microtask Available, Send us your CV! , 2013, 2013 International Conference on Cloud and Green Computing.

[32]  Alessandro Bozzon,et al.  Choosing the right crowd: expert finding in social networks , 2013, EDBT '13.

[33]  Noah E. Friedkin,et al.  Horizons of Observability and Limits of Informal Control in Organizations , 1983 .

[34]  Whitfield Diffie,et al.  New Directions in Cryptography , 1976, IEEE Trans. Inf. Theory.

[35]  Christopher G. Harris Dirty Deeds Done Dirt Cheap: A Darker Side to Crowdsourcing , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[36]  Alessandro Acquisti,et al.  Information revelation and privacy in online social networks , 2005, WPES '05.

[37]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[38]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[39]  Gang Wang,et al.  Man vs. Machine: Practical Adversarial Detection of Malicious Crowdsourcing Workers , 2014, USENIX Security Symposium.

[40]  R. Manmatha,et al.  Word spotting for historical documents , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[41]  Michael S. Bernstein,et al.  EmailValet: managing email overload through private, accountable crowdsourcing , 2013, CSCW.

[42]  Schahram Dustdar,et al.  Auction-based crowdsourcing supporting skill management , 2013, Inf. Syst..

[43]  M. Mitzenmacher,et al.  Parallel randomized load balancing , 1998 .

[44]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[45]  Krzysztof Z. Gajos,et al.  Platemate: crowdsourcing nutritional analysis from food photographs , 2011, UIST.

[46]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[47]  Jaime Teevan,et al.  Preserving Privacy in Crowd-Powered Systems , 2016 .

[48]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[49]  Tore Opsahl,et al.  Clustering in weighted networks , 2009, Soc. Networks.