论文信息 - Crowdsourcing in the Software Development Industry

Crowdsourcing in the Software Development Industry

The term crowdsourcing was coined by journalist Jeff Howe, who defines it as “outsourcing a task to a large group of people in the form of an open call”. Crowdsourcing has been used increasingly by the software industry to both lower opportunity costs and increase quality of output by utilizing capital from outside the company, in the form of the experience, labor, or creativity of outside programmers worldwide. Some platforms for crowdsourcing applications, such as Amazon’s Mechanical Turk, provide a means for the acquisition of large amounts of human knowledge in an inexpensive manner. Other platforms, such as TopCoder, use crowdsourcing methods to drive software coding and development, creating contests where programmers compete for a monetary prize by designing algorithms that meet the company’s specifications. A third type of platform, exemplified by MathWorks’s programming competitions, utilizes a unique form of “competitive collaboration” to produce highly efficient software with almost no financial cost to the project coordinators. This paper will investigate competitive and collaborative software frameworks for online crowdsourcing. Other topics investigated will be participants in the collaboration process, external and intrinsic incentives to ensure crowd participation, and parallel versus iterative design and development in crowdsourced applications. Introduction – Open Source Software The growth of the free and open source software (FOSS) movement in the 1980s built a foundation for the distributed development of software and the incorporation of design contributions from a diverse and geographically non-localized community of programmers (von Krogh and von Hippel 2003). The open-source community used the growing capabilities of the Internet to share software and code, coordinating the development of sophisticated open source projects such as the Apache Web Server and the Linux operating system through “user innovation networks,” giving anyone with Web access the power to “download, use, modify, and further develop” the community’s software (von Hippel 2008). Open source’s economic model was a hybrid of private investment and collective action – programmers “used their own resources to privately invest in creating novel software code... then freely revealed it as a public good” (von Krogh and von Hippel 2003). By releasing the source code for their programs, the development communities lost their competitive edge with vendors of proprietary code such as Microsoft, but gained widespread adoption of their code and appreciation of its robust and easily modifiable qualities. FOSS demonstrated the capability of a distributed group to develop successful software, even when most of the contributors did not receive financial compensation for their labor. In these ways, the open source movement laid the foundation for the software development crowdsourcing platforms that will be discussed in this paper. Micro-Crowdsourcing and Amazon's Mechanical Turk Micro-crowdsourcing refers to the distribution of small tasks requiring little skill and time to complete. Monetary compensation for micro-crowdsourcing, if any is given, is typically small, in accordance with the undemanding nature of the work. While larger scale crowdsourcing projects may involve design or a creative process, micro-crowdsourcing views workers as human processors to complete massive amounts of simple jobs in parallel. This analysis will focus on Amazon's Mechanical Turk micro-crowdsourcing platform, one of the largest publicly available platforms with more than 100,000 workers. Amazon CEO Jeff Bezos describes Mechanical Turk as “artificial artificial intelligence” (Pontin), referring to its use as a tool in the implementation of artificial intelligence applications such as audio transcription, image tagging, and object tracking in computer vision systems (Corney). Computers have difficulty distinguishing the features in sample input needed to make these applications function, and Mechanical Turk provides an inexpensive and quick method to have humans perform these identification tasks. Mechanical Turk is centered on the completion of Human Intelligence Tasks, or HITs. A HIT is a “single, self-contained task that a Worker can work on, submit an answer, and collect a reward for completing” (mTurk). These tasks are set up by Requesters, and completed by Workers. Requesters use an API provided by Amazon to set up simple HITs which workers can complete using any web browser, logged into their worker account. These submissions are reviewed upon completion by the Requester, and the Worker is potentially paid, with Amazon receiving a small commission of 10%. (Figure 1) HITs are characterized by the small amount of time required for their completion and the small amount paid to the worker, frequently as low as $0.01. The Mechanical Turk model places few obligations on either party involved. Workers are able to choose HITs based on a description of the HIT, reward per HIT, number available, and time allotted for completion. A worker can choose to work on any HIT provided he or she meets qualifications set by the requester. However, workers are not obligated to complete tasks, and can stop at any time and work on other tasks. Requesters can set minimum standards for the workers, including experience on the website, number of submissions, and whether these submissions were accepted or rejected by other requesters. Additionally, the requester can choose to make potential workers undergo simple tests to determine their competence in the area of the task. Once the HIT is completed, the requester can review the task and choose to either accept or reject the submission, affecting the worker's public record on the site. Requesters also have the option of giving a bonus to individual workers upon completion of the HIT, in addition to the stated reward for completion. Figure 1: Mechanical Turk Framework Mechanical Turk has been used as a knowledge-acquisition backend to generate large datasets for research and software development projects. The accuracy of artificial intelligence systems such as natural language processing and visual recognition systems is improved by training with datasets that correctly match an input with the desired output. Construction of these datasets involves tasks such as matching an image with a set of descriptive words, or matching an audio file with a transcription of that file – tasks that are best performed by humans. Services such as Mechanical Turk provide the ability to divide construction of these datasets up into nearly identical subtasks and distribute them to a large group of workers who are able to work on them independently and in parallel. In certain cases, such as the construction of MIT's LabelMe image annotation dataset, this has allowed the construction of a dataset more quickly and less expensively than traditional methods would have allowed. LabelMe has accumulated over 400,000 images and associated descriptive words since 2005 through the use of Mechanical Turk and a similar but non-paying interface (Torralba 2010). The ease of participation in services such as Mechanical Turk enables widespread use, but also raises concerns about the quality of the task submissions. The anonymous nature of the workers in the crowd makes it theoretically easy to submit erroneous or poor quality work with no lasting effect on the worker's ability to complete future tasks. While Mechanical Turk has methods to try to maintain high quality submissions, such as keeping track of each user's success in completing previous tasks, it is very easy for a worker to disassociate himself from a poor record by creating a new account, or artificially boost his rating by creating, completing, and approving his own HITs (Ipeirotis). However, experimental generation of natural language processing data sets using Mechanical Turk have produced results that compare favorably to those produced by experts, which are typically “extremely expensive in both annotator-hours and financial cost” (Snow 2008). Raw translation and annotation data collected from Mechanical Turk is more abundant but of poorer quality than data produced by linguistic experts for the same tasks. However, through statistical pooling and exclusion of outliers, the non-expert data can be normalized to achieve accuracy very similar to expert produced data. A study that used Mechanical Turk to generate a range of linguistic data sets found that, on average, only 4 non-expert annotations per example were required to achieve the same accuracy as an expert evaluation. The parallel evaluation of the tasks allowed them to be completed quickly, at a rate of 1724 annotations / hour, and at a low cost, of 875 annotations / dollar (Snow 2008). These results show a vast improvement over the costs of data produced by experts in the field, which have the potential to cost thousands of dollars if conducted in an academic setting using linguists or graduate students. An additional model that has been used to generate high quality results using Mechanical Turk involves iterative, rather than parallel, labor. In this model, instead of having different problems solved by different workers in parallel, successive workers do tasks that build each on each other, with one worker's outputs used as the input for the next worker. The TurkIt software developed by Greg Little of MIT's CSAIL builds on the Mechanical Turk interface functions, and automatically generates new HITs based on the results of previous HITs (Little 2009). TurkIt's framework allows for iterative cycles based on two types of tasks: improvement tasks and voting tasks. In Little's experiment, this cycle was applied to an image description task. A worker was presented with an image and a brief paragraph that described it, and asked to improve the description. After the worker submitted his improvements, another task was generated that presented

Daniel Fried | Daniel Fried

[1] Panagiotis G. Ipeirotis,et al. Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[2] M. Knudsen,et al. Some immediate – but negative – effects of openness on product development performance , 2011 .

[3] Robert J. Allio,et al. CEO interview: the InnoCentive model of open innovation , 2004 .

[4] Lydia B. Chilton,et al. TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[5] Karim R. Lakhani,et al. Community, Joining, and Specialization in Open Source Software Innovation: A Case Study , 2003 .

[6] Karim R. Lakhani,et al. Parallel Search , Incentives and Problem Type : Revisiting the Competition and Innovation Link Parallel Search , Incentives and Problem Type : Revisiting the Competition and Innovation Link 1 , 2008 .

[7] Schahram Dustdar,et al. Modeling and mining of dynamic trust in complex service-oriented systems , 2010, Inf. Syst..

[8] William C. Regli,et al. Putting the crowd to work in a knowledge-based factory , 2010, Adv. Eng. Informatics.

[9] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[10] Chris Callison-Burch,et al. Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[11] Siobhan O’Mahony. Guarding the commons: how community managed software projects protect their work , 2003 .