Language Understanding in the Wild: Combining Crowdsourcing and Machine Learning

Social media has led to the democratisation of opinion sharing. A wealth of information about public opinions, current events, and authors' insights into specific topics can be gained by understanding the text written by users. However, there is a wide variation in the language used by different authors in different contexts on the web. This diversity in language makes interpretation an extremely challenging task. Crowdsourcing presents an opportunity to interpret the sentiment, or topic, of free-text. However, the subjectivity and bias of human interpreters raise challenges in inferring the semantics expressed by the text. To overcome this problem, we present a novel Bayesian approach to language understanding that relies on aggregated crowdsourced judgements. Our model encodes the relationships between labels and text features in documents, such as tweets, web articles, and blog posts, accounting for the varying reliability of human labellers. It allows inference of annotations that scales to arbitrarily large pools of documents. Our evaluation using two challenging crowdsourcing datasets shows that by efficiently exploiting language models learnt from aggregated crowdsourced labels, we can provide up to 25% improved classifications when only a small portion, less than 4% of documents has been labelled. Compared to the six state-of-the-art methods, we reduce by up to 67% the number of crowd responses required to achieve comparable accuracy. Our method was a joint winner of the CrowdFlower - CrowdScale 2013 Shared Task challenge at the conference on Human Computation and Crowdsourcing (HCOMP 2013).

[1]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[2]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[3]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[4]  Bernardete Ribeiro,et al.  Learning from multiple annotators: Distinguishing good from random labelers , 2013, Pattern Recognit. Lett..

[5]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[6]  Nicholas R. Jennings,et al.  Efficient budget allocation with accuracy guarantees for crowdsourcing classification tasks , 2013, AAMAS.

[7]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  Jordan Raddick,et al.  Galaxy Zoo: Morphological Classification and Citizen Science , 2011, 1104.5513.

[10]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[11]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[12]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[13]  G. Eysenbach,et al.  Pandemics in the Age of Twitter: Content Analysis of Tweets during the 2009 H1N1 Outbreak , 2010, PloS one.

[14]  Stephen J. Roberts,et al.  Predicting Economic Indicators from Web Text Using Sentiment Composition , 2014 .

[15]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[18]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[19]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[20]  C. Lintott,et al.  Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey , 2013, 1308.3496.

[21]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Mor Naaman,et al.  Effects of gender and tie strength on Twitter interactions , 2013, First Monday.

[24]  Declan Butler,et al.  Crowdsourcing goes mainstream in typhoon response , 2013, Nature.

[25]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[26]  Jaime G. Carbonell,et al.  A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy , 2010, SDM.

[27]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[28]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[29]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[30]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[31]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[32]  Stephen J. Roberts,et al.  Dynamic Bayesian Combination of Multiple Imperfect Classifiers , 2012, Decision Making and Imperfection.

[33]  Nicholas R. Jennings,et al.  Human-agent collectives , 2014, CACM.