An Introduction to Hybrid Human-Machine Information Systems

Hybrid Human-Machine Information Systems leverage novel architecturesthat make systematic use of Human Computation by means ofcrowdsourcing. These architectures are capable of scaling over largeamounts of data and simultaneously maintain high-quality data processinglevels by introducing humans into the loop. Such hybrid systemshave been developed to tackle a variety of problems and come withinter-disciplinary challenges. They need to deal with the full spectrumof challenges from the social science standpoint, such as understandingcrowd workers behavior and motivations when performing tasks.These systems also need to overcome highly technical challenges likeconstraint optimization and resource allocation based on limited budgetsand deadlines to be met.In this paper, we introduce the area of Human Computation andpresent an overview of different applications for which Hybrid Human-Machine Information Systems have already been used in the realms ofdata management, information retrieval, natural language processing,semantic web, machine learning, and multimedia to better solve existingproblems. Finally, we discuss current research directions, opportunitiesfor the future development of such systems and their applicationin practice.

[1]  Jane Hunter,et al.  Reasoning on Crowd-Sourced Semantic Annotations to Facilitate Cataloguing of 3D Artefacts in the Cultural Heritage Domain , 2013, SEMWEB.

[2]  Philippe Cudré-Mauroux,et al.  SANAPHOR: Ontology-Based Coreference Resolution , 2015, SEMWEB.

[3]  Stefan Dietze,et al.  Improving learning through achievement priming in crowdsourced information finding microtasks , 2017, LAK.

[4]  Mohan S. Kankanhalli,et al.  Tweeting Cameras for Event Detection , 2015, WWW.

[5]  A. Hotho,et al.  HypTrails: A Bayesian Approach for Comparing Hypotheses About Human Trails on the Web , 2014, WWW.

[6]  Mohammad Soleymani,et al.  The Community and the Crowd: Multimedia Benchmark Dataset Development , 2012, IEEE MultiMedia.

[7]  Purnamrita Sarkar,et al.  Crowdsourced enumeration queries , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[8]  Ingemar J. Cox,et al.  On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents , 2012, ECIR.

[9]  Mark A. Musen,et al.  Mechanical turk as an ontology engineer?: using microtasks as a component of an ontology-engineering workflow , 2013, WebSci.

[10]  Hinrich Schütze,et al.  Active Learning with Amazon Mechanical Turk , 2011, EMNLP.

[11]  Walter S. Lasecki,et al.  Architecting Real-Time Crowd-Powered Systems , 2014, Hum. Comput..

[12]  Derek Greene,et al.  Using Crowdsourcing and Active Learning to Track Sentiment in Online Media , 2010, ECAI.

[13]  Eddy Maddalena,et al.  Considering Assessor Agreement in IR Evaluation , 2017, ICTIR.

[14]  Claudia Niederée,et al.  Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data , 2012, WSDM '12.

[15]  Sanjeev Khanna,et al.  Using the crowd for top-k and group-by queries , 2013, ICDT '13.

[16]  Gianluca Demartini,et al.  Pooling-based continuous evaluation of information retrieval systems , 2015, Information Retrieval Journal.

[17]  Neoklis Polyzotis,et al.  Human-Powered Top-k Lists , 2013, WebDB.

[18]  Stefan Dietze,et al.  Understanding Malicious Behavior in Crowdsourcing Platforms: The Case of Online Surveys , 2015, CHI.

[19]  Andreas Krause,et al.  Evaluating Task-Dependent Taxonomies for Navigation , 2016, HCOMP.

[20]  Michael L. Creech,et al.  FotoFile: a consumer multimedia organization and retrieval system , 1999, CHI '99.

[21]  Andrea Wiggins,et al.  Community-based Data Validation Practices in Citizen Science , 2016, CSCW.

[22]  Daniela Braga,et al.  Controlling quality and handling fraud in large scale crowdsourcing speech data collections , 2015, INTERSPEECH.

[23]  Elaine Toms,et al.  The Relationship Between User Perception and User Behaviour in Interactive Information Retrieval Evaluation , 2016, ECIR.

[24]  Phuoc Tran-Gia,et al.  Quantification of YouTube QoE via Crowdsourcing , 2011, 2011 IEEE International Symposium on Multimedia.

[25]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[26]  Aniket Kittur,et al.  Alloy: Clustering with Crowds and Computation , 2016, CHI.

[27]  Elena Paslaru Bontas Simperl,et al.  CrowdMap: Crowdsourcing Ontology Alignment with Microtasks , 2012, SEMWEB.

[28]  Martin Halvey,et al.  Evaluating the effort involved in relevance assessments for images , 2014, SIGIR.

[29]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[30]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[31]  David R. Karger,et al.  Human-powered Sorts and Joins , 2011, Proc. VLDB Endow..

[32]  Alessandro Bozzon,et al.  Clarity is a Worthwhile Quality: On the Role of Task Clarity in Microtask Crowdsourcing , 2017, HT.

[33]  Xiao-Feng Xie,et al.  An Empirical Study of Combining Participatory and Physical Sensing to Better Understand and Improve Urban Mobility Networks , 2015 .

[34]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[35]  Wolf-Tilo Balke,et al.  Skyline queries in crowd-enabled databases , 2013, EDBT '13.

[36]  Gabriella Kazai,et al.  An analysis of human factors and label accuracy in crowdsourcing relevance judgments , 2013, Information Retrieval.

[37]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[38]  Mary Flanagan,et al.  "By the People, For the People": Assessing the Value of Crowdsourced, User-Generated Metadata , 2015, Digit. Humanit. Q..

[39]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[40]  C. Lintott,et al.  Galaxy Zoo: morphologies derived from visual inspection of galaxies from the Sloan Digital Sky Survey , 2008, 0804.4483.

[41]  Salil S. Kanhere Participatory Sensing: Crowdsourcing Data from Mobile Smartphones in Urban Spaces , 2013, ICDCIT.

[42]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[43]  Kristen Grauman,et al.  Large-scale live active learning: Training object detectors with crawled data and crowds , 2011, CVPR.

[44]  Gianluca Demartini,et al.  Scaling-Up the Crowd: Micro-Task Pricing Schemes for Worker Retention and Latency Improvement , 2014, HCOMP.

[45]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[46]  Ricardo Kawase,et al.  Exploiting Twitter as a Social Channel for Human Computation , 2012, CrowdSearch.

[47]  Walid G. Aref,et al.  The Palm-tree Index: Indexing with the crowd , 2013, DBCrowd.

[48]  Roi Blanco,et al.  Repeatable and reliable search system evaluation using crowdsourcing , 2011, SIGIR.

[49]  Lora Aroyo,et al.  The Three Sides of CrowdTruth , 2014, Hum. Comput..

[50]  Gianluca Demartini,et al.  Scheduling Human Intelligence Tasks in Multi-Tenant Crowd-Powered Systems , 2016, WWW.

[51]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[52]  Neha Gupta,et al.  Modus Operandi of Crowd Workers , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[53]  Mun Y. Yi,et al.  Predicting the use of web-based information systems: self-efficacy, enjoyment, learning goal orientation, and the technology acceptance model , 2003, Int. J. Hum. Comput. Stud..

[54]  Jacki O'Neill,et al.  Turk-Life in India , 2014, GROUP.

[55]  Wolf-Tilo Balke,et al.  Pushing the Boundaries of Crowd-enabled Databases with Query-driven Schema Expansion , 2012, Proc. VLDB Endow..

[56]  Samantha Wray,et al.  Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic , 2015, INTERSPEECH.

[57]  Adam Marcus,et al.  Optimization techniques for human computation-enabled data processing systems , 2012 .

[58]  Gabriella Kazai,et al.  Overview of the TREC 2013 Crowdsourcing Track , 2013, TREC.

[59]  Deborah Estrin,et al.  Image browsing, processing, and clustering for participatory sensing: lessons from a DietSense prototype , 2007, EmNets '07.

[60]  David Corsar,et al.  A Role for Provenance in Social Computation , 2013, CrowdSem.

[61]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[62]  Daniela Braga,et al.  Data collection and annotation for state-of-the-art NER using unmanaged crowds , 2015, INTERSPEECH.

[63]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[64]  Leysia Palen,et al.  From Crowdsourced Mapping to Community Mapping: The Post-earthquake Work of OpenStreetMap Haiti , 2014, COOP.

[65]  Mark A Musen,et al.  Using the wisdom of the crowds to find critical errors in biomedical ontologies: a study of SNOMED CT , 2015, J. Am. Medical Informatics Assoc..

[66]  Alexander Schill,et al.  Decentralised approach for a reusable crowdsourcing platform utilising standard web servers , 2013, UbiComp.

[67]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[68]  Kalina Bontcheva,et al.  Broad Twitter Corpus: A Diverse Named Entity Recognition Resource , 2016, COLING.

[69]  Omar Alonso,et al.  Relevance criteria for e-commerce: a crowdsourcing-based experimental analysis , 2009, SIGIR.

[70]  Krzysztof Z. Gajos,et al.  Curiosity Killed the Cat, but Makes Crowdwork Better , 2016, CHI.

[71]  Stephen J. Roberts,et al.  Bayesian Methods for Intelligent Task Assignment in Crowdsourcing Systems , 2015, Decision Making.

[72]  Gabriella Kazai,et al.  Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking , 2011, SIGIR.

[73]  Sonia Chernova,et al.  Crowdsourcing the construction of a 3D object recognition database for robotic grasping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[74]  Tim Kraska,et al.  Leveraging transitive relations for crowdsourced joins , 2013, SIGMOD '13.

[75]  Gianluca Demartini,et al.  Large-scale linked data integration using probabilistic reasoning and crowdsourcing , 2013, The VLDB Journal.

[76]  Vikas Kumar,et al.  CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[77]  Bernardete Ribeiro,et al.  On using crowdsourcing and active learning to improve classification performance , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[78]  Jacki O'Neill,et al.  Being a turker , 2014, CSCW.

[79]  Panagiotis G. Ipeirotis,et al.  The Global Opportunity in Online Outsourcing , 2015 .

[80]  Tom Minka,et al.  How To Grade a Test Without Knowing the Answers - A Bayesian Graphical Model for Adaptive Crowdsourcing and Aptitude Testing , 2012, ICML.

[81]  Gierad Laput,et al.  Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds , 2015, CHI.

[82]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[83]  Aditya G. Parameswaran,et al.  So who won?: dynamic max discovery with the crowd , 2012, SIGMOD Conference.

[84]  Neoklis Polyzotis,et al.  Max algorithms in crowdsourcing environments , 2012, WWW.

[85]  Gita Reese Sukthankar,et al.  Incremental Relabeling for Active Learning with Noisy Crowdsourced Annotations , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[86]  Matthew Lease,et al.  Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.

[87]  Thomas D. Wilson,et al.  Human Information Behavior , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[88]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[89]  Charles L. A. Clarke,et al.  Overview of the TREC 2012 Contextual Suggestion Track , 2013, TREC.

[90]  Omar Alonso,et al.  Implementing crowdsourcing-based relevance experimentation: an industrial perspective , 2013, Information Retrieval.

[91]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[92]  Tim Kraska,et al.  CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..

[93]  Bill Tomlinson,et al.  Who are the crowdworkers?: shifting demographics in mechanical turk , 2010, CHI Extended Abstracts.

[94]  Lora Aroyo,et al.  Crowdsourcing in the cultural heritage domain: opportunities and challenges , 2011, C&T.

[95]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[96]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[97]  Liwen Sun,et al.  Arnold: Declarative Crowd-Machine Data Integration , 2013, CIDR.

[98]  Jerome P. Lynch,et al.  A summary review of wireless sensors and sensor networks for structural health monitoring , 2006 .

[99]  Elena Paslaru Bontas Simperl,et al.  Towards Hybrid NER: A Study of Content and Crowdsourcing-Related Performance Factors , 2015, ESWC.

[100]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[101]  Demetrios Zeinalipour-Yazti,et al.  Crowdsourcing with Smartphones , 2012, IEEE Internet Computing.

[102]  Karl Aberer,et al.  TransactiveDB: Tapping into Collective Human Memories , 2014, Proc. VLDB Endow..

[103]  Nicu Sebe,et al.  Human-centered computing: a multimedia perspective , 2006, MM '06.

[104]  Michael S. Bernstein,et al.  Direct answers for search queries in the long tail , 2012, CHI.

[105]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[106]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[107]  Maribel Acosta,et al.  Crowdsourcing Linked Data Quality Assessment , 2013, SEMWEB.

[108]  Björn Hartmann,et al.  What's the Right Price? Pricing Tasks for Finishing on Time , 2011, Human Computation.

[109]  Jaideep Srivastava,et al.  Entity Identification in Database Integration , 1996, Inf. Sci..

[110]  Benno Stein,et al.  Automatic Vandalism Detection in Wikipedia , 2008, ECIR.

[111]  Anne Marie Piper,et al.  “Why would anybody do this?”: Older Adults’ Understanding of and Experiences with Crowd Work , 2016 .

[112]  Joseph M. Hellerstein,et al.  Searching for Jim Gray: a technical overview , 2011, CACM.

[113]  Devavrat Shah,et al.  Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems , 2011, Oper. Res..

[114]  Ellie D'Hondt,et al.  Crowdsourcing of Pollution Data using Smartphones , 2010 .

[115]  Gabriella Kazai,et al.  Worker types and personality traits in crowdsourcing relevance labels , 2011, CIKM '11.

[116]  Davide Martinenghi,et al.  A Framework for Crowdsourced Multimedia Processing and Querying , 2012, CrowdSearch.

[117]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[118]  Panagiotis G. Ipeirotis Analyzing the Amazon Mechanical Turk marketplace , 2010, XRDS.

[119]  Gianluca Demartini,et al.  Hybrid human-machine information systems: Challenges and opportunities , 2015, Comput. Networks.

[120]  Cees Snoek,et al.  Crowdsourcing rock n' roll multimedia retrieval , 2010, ACM Multimedia.

[121]  Bernhard Haslhofer,et al.  Augmenting Europeana content with linked data resources , 2010, I-SEMANTICS '10.

[122]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[123]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[124]  Mark A. Musen,et al.  Crowdsourcing Ontology Verification , 2013, ICBO.

[125]  Eddy Maddalena,et al.  Let's Agree to Disagree: Fixing Agreement Measures for Crowdsourcing , 2017, HCOMP.

[126]  Purnamrita Sarkar,et al.  Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning , 2014, Proc. VLDB Endow..

[127]  Jaime G. Carbonell,et al.  Active Learning-Based Elicitation for Semi-Supervised Word Alignment , 2010, ACL.

[128]  Wolfgang Nejdl,et al.  When in Doubt Ask the Crowd: Employing Crowdsourcing for Active Learning , 2014, WIMS '14.

[129]  Mia Ridge Crowdsourcing our Cultural Heritage , 2014 .

[130]  Jennifer Widom,et al.  CrowdScreen: algorithms for filtering data with humans , 2012, SIGMOD Conference.

[131]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[132]  Stefan Dietze,et al.  Using Worker Self-Assessments for Competence-Based Pre-Selection in Crowdsourcing Microtasks , 2017, ACM Trans. Comput. Hum. Interact..

[133]  Andrea Zanella,et al.  Internet of Things for Smart Cities , 2014, IEEE Internet of Things Journal.

[134]  Oliver Amft,et al.  An opportunistic activity-sensing approach to save energy in office buildings , 2013, e-Energy '13.

[135]  Stefan Dietze,et al.  A taxonomy of microtasks on the web , 2014, HT.

[136]  Qihui Wu,et al.  Robust Spectrum Sensing With Crowd Sensors , 2014, IEEE Trans. Commun..

[137]  David R. Karger,et al.  Counting with the Crowd , 2012, Proc. VLDB Endow..

[138]  Kalina Bontcheva,et al.  The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy , 2014, EACL.

[139]  Georgia Koutrika,et al.  Entity resolution with iterative blocking , 2009, SIGMOD Conference.

[140]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[141]  Ria Mae Borromeo,et al.  Automatic vs. Crowdsourced Sentiment Analysis , 2015, IDEAS.

[142]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[143]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[144]  Justin Manweiler,et al.  FOCUS: clustering crowdsourced videos by line-of-sight , 2013, SenSys '13.

[145]  Gianluca Demartini,et al.  ZenCrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking , 2012, WWW.

[146]  Mia Ridge,et al.  From Tagging to Theorizing: Deepening Engagement with Cultural Heritage through Crowdsourcing , 2013 .

[147]  Eric Horvitz,et al.  Combining human and machine intelligence in large-scale crowdsourcing , 2012, AAMAS.

[148]  Gerhard Weikum,et al.  Crowdsourced Entity Markup , 2013, CrowdSem.

[149]  Judith Redi,et al.  Modeling Task Complexity in Crowdsourcing , 2016, HCOMP.

[150]  Fernando Diaz,et al.  Sources of evidence for vertical selection , 2009, SIGIR.

[151]  Mark A. Musen,et al.  Crowdsourcing the Verification of Relationships in Biomedical Ontologies , 2013, AMIA.

[152]  Ahmed A. Rafea,et al.  TextOntoEx: Automatic ontology construction from natural English text , 2008, Expert Syst. Appl..

[153]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[154]  Tim Kraska,et al.  CrowdQ: Crowdsourced Query Understanding , 2013, CIDR.

[155]  Chin-Laung Lei,et al.  A crowdsourceable QoE evaluation framework for multimedia content , 2009, ACM Multimedia.

[156]  Alexis Battle,et al.  The jabberwocky programming environment for structured social computing , 2011, UIST.

[157]  M. Six Silberman,et al.  Turkopticon: interrupting worker invisibility in amazon mechanical turk , 2013, CHI.