Towards crowdsourcing translation tasks in library cataloguing, a pilot study

Although automated translation systems are increasingly impressive they are still far from perfect, and even casual use demonstrates that they can?t produce robust, error free, transcriptions of arbitrary text. However it is becoming increasingly apparent that Crowdsourcing of translation tasks is not only viable but, in many cases, provides results equal to more expensive and slower alternatives. This paper provides a brief survey of academic and commercial systems that harness collective intelligence in the translation of text, and presents the results of a pilot conducted using MTurk for the translation of non-Roman scripts of unknown providence. It concludes by proposing a workflow for integration of Crowdsourcing in to the cataloguing of foreign texts.

[1]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[2]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[3]  Susumu Kuno,et al.  Syntactic structure and ambiguity of English , 1899, AFIPS '63 (Fall).

[4]  Lorrie Faith Cranor,et al.  Are your participants gaming the system?: screening mechanical turk workers , 2010, CHI.

[5]  Víctor M. Sánchez-Cartagena,et al.  Tradubi: Open-Source Social Translation for the Apertium Machine Translation Platform , 2010, Prague Bull. Math. Linguistics.

[6]  Nathan Eagle,et al.  txteagle: Mobile Crowdsourcing , 2009, HCI.

[7]  M. J. Hunt Figures of merit for assessing connected-word recognisers , 1990, Speech Commun..

[8]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[9]  Joseph Polifroni,et al.  Crowd translator: on building localized speech recognizers through micropayments , 2010, OPSR.

[10]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[11]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[12]  Kevin Feeney,et al.  The management of crowdsourcing in business processes , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management-Workshops.

[13]  William C. Regli,et al.  Geometric reasoning via internet CrowdSourcing , 2009, Symposium on Solid and Physical Modeling.