Human-in-the-Loop Web Resource Classification

Engaging humans in the resolution of classification tasks has been shown to be effective especially when digital resources are considered, with complex features to be abstracted for an automated procedure, like images or multimedia web resources. In this paper, we propose the \(\mathsf {HC^2}\) crowdclustering approach for unsupervised classification of web resources, by allowing the classification categories to dynamically emerge from the crowd. In \(\mathsf {HC^2}\), crowd workers actively participate to clustering activities (i) by resolving tasks in which they are asked to visually recognize groups of similar resources and (ii) by labeling recognized clusters with prominent keywords. To increase flexibility, \(\mathsf {HC^2}\) can be interactively configured to dynamically set the balance between human engagement and automated procedures in cluster formation, according to the kind and nature of resources to be classified. For experimentation and evaluation, the \(\mathsf {HC^2}\) approach has been deployed on the Argo platform providing crowdsourcing techniques for consensus-based task execution.

[1]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[2]  Aniket Kittur,et al.  Crowd synthesis: extracting categories and clusters from complex data , 2014, CSCW.

[3]  Jinfeng Yi,et al.  Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning , 2012, NIPS.

[4]  Gang Wang,et al.  Web Image Organization and Object Discovery by Actively Creating Visual Clusters through Crowdsourcing , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[5]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[6]  AnHai Doan,et al.  Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing , 2014, Proc. VLDB Endow..

[7]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[8]  Silvana Castano,et al.  Dimensional Clustering of Linked Data: Techniques and Applications , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[9]  Juyong Park,et al.  Linking and clustering artworks using social tags: Revitalizing crowd‐sourced information on cultural collections , 2016, J. Assoc. Inf. Sci. Technol..

[10]  Seung-won Hwang,et al.  Hybrid entity clustering using crowds and data , 2013, The VLDB Journal.

[11]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2008, Information Retrieval.

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  William Rand,et al.  Automatic Crowdsourcing-Based Classification of Marketing Messaging on Twitter , 2013, 2013 International Conference on Social Computing.

[14]  Silvana Castano,et al.  Combining crowd consensus and user trustworthiness for managing collective tasks , 2016, Future Gener. Comput. Syst..

[15]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .