PPLib: Toward the Automated Generation of Crowd Computing Programs Using Process Recombination and Auto-Experimentation

Crowdsourcing is increasingly being adopted to solve simple tasks such as image labeling and object tagging, as well as more complex tasks, where crowd workers collaborate in processes with interdependent steps. For the whole range of complexity, research has yielded numerous patterns for coordinating crowd workers in order to optimize crowd accuracy, efficiency, and cost. Process designers, however, often don't know which pattern to apply to a problem at hand when designing new applications for crowdsourcing. In this article, we propose to solve this problem by systematically exploring the design space of complex crowdsourced tasks via automated recombination and auto-experimentation for an issue at hand. Specifically, we propose an approach to finding the optimal process for a given problem by defining the deep structure of the problem in terms of its abstract operators, generating all possible alternatives via the (re)combination of the abstract deep structure with concrete implementations from a Process Repository, and then establishing the best alternative via auto-experimentation. To evaluate our approach, we implemented PPLib (pronounced “People Lib”), a program library that allows for the automated recombination of known processes stored in an easily extensible Process Repository. We evaluated our work by generating and running a plethora of process candidates in two scenarios on Amazon's Mechanical Turk followed by a meta-evaluation, where we looked at the differences between the two evaluations. Our first scenario addressed the problem of text translation, where our automatic recombination produced multiple processes whose performance almost matched the benchmark established by an expert translation. In our second evaluation, we focused on text shortening; we automatically generated 41 crowd process candidates, among them variations of the well-established Find-Fix-Verify process. While Find-Fix-Verify performed well in this setting, our recombination engine produced five processes that repeatedly yielded better results. We close the article by comparing the two settings where the Recombinator was used, and empirically show that the individual processes performed differently in the two settings, which led us to contend that there is no unifying formula, hence emphasizing the necessity for recombination.

[1]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[2]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[3]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[4]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[5]  Michael Hammer,et al.  A very high level programming language for data processing applications , 1977, Commun. ACM.

[6]  H. Simon,et al.  The New Science of Management Decision, Revised Edition. , 1977 .

[7]  Niklaus Wirth,et al.  Program development by stepwise refinement , 1971, CACM.

[8]  Terry Winograd,et al.  Understanding computers and cognition - a new foundation for design , 1987 .

[9]  R. Beckwith,et al.  Aspects of a theory of mind: An interview with Noam Chomsky , 1986 .

[10]  Kevin Crowston,et al.  Tools for inventing organizations: toward a handbook of organizational processes , 1993, [1993] Proceedings Second Workshop on Enabling Technologies@m_Infrastructure for Collaborative Enterprises.

[11]  D. L. Flarey Reengineering the Corporation , 1994 .

[12]  Kevin Crowston,et al.  The interdisciplinary study of coordination , 1994, CSUR.

[13]  Karl T. Ulrich,et al.  Product Design and Development , 1995 .

[14]  Mark Klein,et al.  The process recombinator: a tool for generating new business process ideas , 1999, ICIS.

[15]  Abraham Bernstein,et al.  How can cooperative work tools support dynamic group process? bridging the specificity frontier , 2000, CSCW '00.

[16]  Jintae Lee,et al.  Process Specialization: Defining Specialization for State Diagrams , 2002, Comput. Math. Organ. Theory.

[17]  Kevin Crowston,et al.  Organizing Business Knowledge: The MIT Process Handbook , 2003 .

[18]  Georg Lausen,et al.  Ontologies in F-logic , 2004, Handbook on Ontologies.

[19]  Paolo Traverso,et al.  Automated Planning: Theory & Practice , 2004 .

[20]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Stefan Edelkamp,et al.  Automated Planning: Theory and Practice , 2007, Künstliche Intell..

[22]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[23]  Brian T. Pentland,et al.  Process Grammar as a Tool for Business Process Design , 2008, MIS Q..

[24]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[25]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[26]  Floarea Serban,et al.  Auto-experimentation of KDD Workflows Based on Ontological Planning , 2010, SEMWEB.

[27]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[28]  Chrysanthos Dellarocas,et al.  The collective intelligence genome , 2010, IEEE Engineering Management Review.

[29]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[30]  Alexis Battle,et al.  The jabberwocky programming environment for structured social computing , 2011, UIST.

[31]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[32]  Haoqi Zhang,et al.  An Iterative Dual Pathway Structure for Speech-to-Text Transcription , 2011, Human Computation.

[33]  Ramesh Govindan,et al.  Medusa: a programming framework for crowd-sensing applications , 2012, MobiSys '12.

[34]  Mark Klein,et al.  Programming the global brain , 2012, Commun. ACM.

[35]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[36]  Abraham Bernstein,et al.  CrowdLang: A Programming Language for the Systematic Exploration of Human Computation Systems , 2012, SocInfo.

[37]  Abraham Bernstein,et al.  A survey of intelligent assistants for data analysis , 2013, CSUR.

[38]  Michael S. Bernstein,et al.  Context Trees: Crowdsourcing Global Understanding from Local Views , 2014, HCOMP.

[39]  Lora Aroyo,et al.  CrowdTruth: Machine-Human Computation Framework for Harnessing Disagreement in Gathering Annotated Data , 2014, SEMWEB.

[40]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[41]  Benjamin Livshits,et al.  Saving Money While Polling with InterPoll Using Power Analysis , 2014, HCOMP.

[42]  Sergiu Goschin,et al.  Stochastic dilemmas: foundations and applications , 2014 .

[43]  Michael D. Zisman Office Automation: Revolution or Evolution? , 2015 .

[44]  Fabio Casati,et al.  Modeling, Enacting, and Integrating Custom Crowdsourcing Processes , 2015, TWEB.

[45]  Andrew McGregor,et al.  AutoMan: a platform for integrating human-based and digital computation , 2012, OOPSLA '12.