How Many Truth Levels? Six? One Hundred? Even More? Validating Truthfulness of Statements via Crowdsourcing

We report on collecting truthfulness values (i) by means of crowdsourcing and (ii) using fine-grained scales. In our experiment we collect truthfulness values using a bounded and discrete scale with 100 levels as well as a magnitude estimation scale, which is unbounded, continuous and has infinite amount of levels. We compare the two scales and discuss the agreement with a ground truth provided by experts on a six-level scale.

[1]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[2]  Eddy Maddalena,et al.  Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing , 2017, HCOMP.

[3]  Mick McGee,et al.  Usability Magnitude Estimation , 2003 .

[4]  G. Gescheider Psychophysics: The Fundamentals , 1997 .

[5]  Ralph B. D'Agostino,et al.  Tests for Departure from Normality , 1973 .

[6]  Matús Medo,et al.  The effect of discrete vs. continuous-valued ratings on reputation and ranking systems , 2010, ArXiv.

[7]  Alan Borning,et al.  Integrating on-demand fact-checking with public dialogue , 2014, CSCW.

[8]  Arkaitz Zubiaga,et al.  All-in-one: Multi-task Learning for Rumour Verification , 2018, COLING.

[9]  Eddy Maddalena,et al.  Do Easy Topics Predict Effectiveness Better Than Difficult Topics? , 2017, ECIR.

[10]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[11]  Preslav Nakov,et al.  Overview of the CLEF-2018 CheckThat! Lab on Automatic Identification and Verification of Political Claims. Task 1: Check-Worthiness , 2018, CLEF.

[12]  George A. Gescheider,et al.  Psychophysics: The Fundamentals , 1997 .

[13]  Howard R. Moskowitz,et al.  MAGNITUDE ESTIMATION: NOTES ON WHAT, HOW, WHEN, AND WHY TO USE IT , 1977 .

[14]  Falk Scholer,et al.  On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation , 2017, ACM Trans. Inf. Syst..

[15]  Eddy Maddalena,et al.  On Fine-Grained Relevance Scales , 2018, SIGIR.

[16]  E. S. Pearson,et al.  Tests for departure from normality. Empirical results for the distributions of b2 and √b1 , 1973 .

[17]  Heng Ji,et al.  Tweet, but verify: epistemic study of information verification on Twitter , 2013, Social Network Analysis and Mining.

[18]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .