Using community structure detection to rank annotators when ground truth is subjective

Learning using labels provided by multiple annotators has attracted a lot of interest in the machine learning community. With the advent of crowdsourcing cheap, noisy labels are easy to obtain. This has raised the question of how to assess annotator quality. Prior work uses bayesian inference to estimate consensus labels and obtain annotator scores based on expertise; the key assumptions are that the ground truth is known and categories of labels are predefined. In applications where it is possible to have multiple ground truths, assessing annotator quality is challenging since the ranking of annotators’ is dependent on the choice of ground-truth. This paper describes a case-study in the context of annotating historic newspaper articles from the New York Public Library. The goal is to assign fine-grained categorization of articles labeled “editorial” by the Optical Character Recognition (OCR) software. The task is subjective since pre-defined categories are not available. To define the ground truth we use a Community Structure Detection (CSD) algorithm in a similarity graph formed between articles. The labels from the CSD algorithm provides the target function to be learned. Annotators labels are then viewed as related tasks that help learn this target function. The technique helps to provide insights into how to rank annotator performance using well known information retrieval metrics.

[1]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[2]  M. Newman,et al.  Finding community structure in networks using the eigenvectors of matrices. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Shipeng Yu,et al.  Ranking annotators for crowdsourced labeling tasks , 2011, NIPS.