Crowdsourcing WordNet

This paper describes an experiment in using Amazon Mechanical Turk to collaboratively create a sense inventory. In a bootstrapping process with massive collaborative input, substitutions for target words in context are elicited and clustered by sense; then more contexts are collected. Contexts that cannot be assigned to a current target word’s sense inventory re-enter the loop and get a supply of substitutions. This process provides a sense inventory with its granularity determined by common substitutions rather than by psychologically motivated concepts. Evaluation shows that the process is robust against noise from the crowd, yields a less fine-grained inventory than WordNet and provides a rich body of high precision substitution data at a low cost.