Understanding the Relationship between Interactions and Outcomes in Human-in-the-Loop Machine Learning

Human-in-the-loop Machine Learning (HIL-ML) is a widely adopted paradigm for instilling human knowledge in autonomous agents. Many design choices influence the efficiency and effectiveness of such interactive learning processes, particularly the interaction type through which the human teacher may provide feedback. While different interaction types (demonstrations, preferences, etc.) have been proposed and evaluated in the HIL-ML literature, there has been little discussion of how these compare or how they should be selected to best address a particular learning problem. In this survey, we propose an organizing principle for HIL-ML that provides a way to analyze the effects of interaction types on human performance and training data. We also identify open problems in understanding the effects of interaction types.

[1]  CaverleeJames,et al.  ACM Transactions on Interactive Intelligent Systems (TiiS) Special Issue on Trust and Influence in Intelligent Human-Machine Interaction , 2018 .

[2]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[3]  International Symposium on Robotics Research , 2013 .

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Matthew Lease,et al.  On Quality Control and Machine Learning in Crowdsourcing , 2011, Human Computation.

[6]  Jose M. Such,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[7]  Suchi Saria,et al.  From development to deployment: dataset shift, causality, and shift-stable models in health AI. , 2019, Biostatistics.

[8]  Sonia Chernova,et al.  A Comparison of Remote Robot Teleoperation Interfaces for General Object Manipulation , 2017, 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI.

[9]  S. Globali,et al.  IEEE INTELLIGENT SYSTEMS , 2022, IEEE MultiMedia.

[10]  Anca D. Dragan,et al.  Reward-rational (implicit) choice: A unifying formalism for reward learning , 2020, NeurIPS.

[11]  H. Arkes,et al.  The sunk cost and Concorde effects: Are humans less rational than lower animals? , 1999 .

[12]  J DudleyJohn,et al.  A Review of User Interface Design for Interactive Machine Learning , 2018 .

[13]  Olaf Hellwich,et al.  The Truth About Ground Truth: Label Noise in Human-Generated Reference Data , 2019, IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium.

[14]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[15]  Karen M. Feigh,et al.  Learning From Explanations Using Sentiment and Advice in RL , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[16]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[17]  Chengqi Zhang,et al.  Conference on Neural Information Processing Systems , 2019 .

[18]  Zhengyou Zhang,et al.  Editorial Renewal for the IEEE Transactions on Autonomous Mental Development , 2014 .

[19]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[20]  Aditya Vashistha,et al.  BSpeak : An Accessible Crowdsourcing Marketplace for Low-Income Blind People , 2018 .

[21]  Yaochu Jin,et al.  Editorial IEEE Transactions on Cognitive and Developmental Systems , 2016, IEEE Trans. Cogn. Dev. Syst..

[22]  C. McDermott Discrimination , 2009, Inclusive Equality.

[23]  Peter Stone,et al.  Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces , 2017, AAAI.

[24]  Jennifer Wortman Vaughan Making Better Use of the Crowd: How Crowdsourcing Can Advance Machine Learning Research , 2017, J. Mach. Learn. Res..

[25]  Sarah Osentoski Crowdsourcing for closed-loop control , 2010 .

[26]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[27]  Oliver Kroemer,et al.  Active Reward Learning , 2014, Robotics: Science and Systems.

[28]  Siddhartha S. Srinivasa,et al.  Telemanipulation with Chopsticks: Analyzing Human Factors in User Demonstrations , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Nikolaos Avouris,et al.  Machine Learning algorithms : a study on noise sensitivity , 2003 .

[30]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[31]  Eyal Amir,et al.  Bayesian Inverse Reinforcement Learning , 2007, IJCAI.

[32]  Udo Kruschwitz,et al.  Assessing Crowdsourcing Quality through Objective Tasks , 2012, LREC.

[33]  T. Michael Knasel,et al.  Robotics and autonomous systems , 1988, Robotics Auton. Syst..

[34]  Yuchen Cui,et al.  Active Reward Learning from Critiques , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Michael I. Jordan,et al.  Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.

[36]  Odest Chadwicke Jenkins,et al.  ACM Transactions on Human-Robot Interaction , 2018, HRI 2018.

[37]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[38]  Luca Longo,et al.  Experienced mental workload, perception of usability, their interaction and impact on task performance , 2018, PloS one.

[39]  Jodi Forlizzi,et al.  Psycho-physiological measures for assessing cognitive load , 2010, UbiComp.

[40]  Anca D. Dragan,et al.  Learning Robot Objectives from Physical Human Interaction , 2017, CoRL.

[41]  Clay B. Holroyd,et al.  Why humans deviate from rational choice. , 2011, Psychophysiology.

[42]  Silvio Savarese,et al.  ROBOTURK: A Crowdsourcing Platform for Robotic Skill Learning through Imitation , 2018, CoRL.

[43]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[44]  Shane Legg,et al.  Reward learning from human preferences and demonstrations in Atari , 2018, NeurIPS.

[45]  Maya Cakmak,et al.  Power to the People: The Role of Humans in Interactive Machine Learning , 2014, AI Mag..

[46]  梶田 尚志,et al.  IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS'97) , 1998 .

[47]  Matthew C. Gombolay,et al.  The Effects of a Robot's Performance on Human Teachers for Learning from Demonstration Tasks , 2021, HRI.

[48]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Anca D. Dragan,et al.  Information gathering actions over human internal state , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[50]  Benjamin Recht,et al.  Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.

[51]  Yuchen Cui,et al.  Uncertainty-Aware Data Aggregation for Deep Imitation Learning , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[52]  Sanjay Modgil,et al.  Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS , 2016, AAMAS 2016.

[53]  John Sweller,et al.  Cognitive Load During Problem Solving: Effects on Learning , 1988, Cogn. Sci..

[54]  Javier Ruiz-del-Solar,et al.  An Interactive Framework for Learning Continuous Actions Policies Based on Corrective Feedback , 2018, Journal of Intelligent & Robotic Systems.

[55]  Antti Oulasvirta,et al.  Teacher-Aware Active Robot Learning , 2019, 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[56]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[57]  Maria X. Maldonado-Morales,et al.  Prejudice, Discrimination, and Stereotyping , 2019, Clinical Handbook of Transcultural Infant Mental Health.

[58]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[59]  Karen M. Feigh,et al.  Interaction Algorithm Effect on Human Experience with Reinforcement Learning , 2018, ACM Transactions on Human-Robot Interaction.

[60]  Joost van de Weijer,et al.  RankIQA: Learning from Rankings for No-Reference Image Quality Assessment , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Anca D. Dragan,et al.  On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference , 2019, ICML.

[63]  Anil K. Jain,et al.  Small sample size effects in statistical pattern recognition: recommendations for practitioners and open problems , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[64]  Maya Cakmak,et al.  Designing robot learners that ask good questions , 2012, 2012 7th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[65]  Bo Liu,et al.  Human Gaze Assisted Artificial Intelligence: A Review , 2020, IJCAI.

[66]  I. Campbell,et al.  Volume 30 , 2002 .

[67]  Xiangnan He,et al.  A Generic Coordinate Descent Framework for Learning from Implicit Feedback , 2016, WWW.

[68]  Prabhat Nagarajan,et al.  Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations , 2019, ICML.

[69]  Thomas C. Reeves,et al.  Mental models: A research focus for interactive learning systems , 1992 .

[70]  Linda R. Elliott,et al.  Managing workload in human-robot interaction: A review of empirical studies , 2010, Comput. Hum. Behav..

[71]  山田 幸恵,et al.  Human-Robot Interaction (HRI) における人の態度・不安・行動 , 2010 .

[72]  K. M. Hawkins,et al.  Development and Learning , 1962, Nature.

[73]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[74]  Andrea Lockerd Thomaz,et al.  Incremental Task Modification via Corrective Demonstrations , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[75]  Henny Admoni,et al.  Interaction Considerations in Learning from Humans , 2021, IJCAI.

[76]  Fiery Cushman,et al.  Showing versus doing: Teaching by demonstration , 2016, NIPS.

[77]  Avi Rosenfeld,et al.  Explainability in human–agent systems , 2019, Autonomous Agents and Multi-Agent Systems.

[78]  François Bry,et al.  Human computation , 2018, it Inf. Technol..

[79]  Per Ola Kristensson,et al.  A Review of User Interface Design for Interactive Machine Learning , 2018, ACM Trans. Interact. Intell. Syst..

[80]  脇元 修一,et al.  IEEE International Conference on Robotics and Automation (ICRA) におけるフルードパワー技術の研究動向 , 2011 .

[81]  A. Thomaz,et al.  Transparent active learning for robots , 2010, 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[82]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[83]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[84]  Siddhartha S. Srinivasa,et al.  Active Comparison Based Learning Incorporating User Uncertainty and Noise , 2016 .

[85]  Jakob Nielsen,et al.  Measuring usability: preference vs. performance , 1994, CACM.

[86]  Peter Stone,et al.  Leveraging Human Guidance for Deep Reinforcement Learning Tasks , 2019, IJCAI.

[87]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[88]  Lawrence D. Jackel,et al.  Limits on Learning Machine Accuracy Imposed by Data Quality , 1995, KDD.

[89]  Andrea Lockerd Thomaz,et al.  Towards Intelligent Arbitration of Diverse Active Learning Queries , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[90]  Anca D. Dragan,et al.  Active Preference-Based Learning of Reward Functions , 2017, Robotics: Science and Systems.

[91]  Andrea Lockerd Thomaz,et al.  Effects of nonverbal communication on efficiency and robustness in human-robot teamwork , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[92]  Maya Cakmak,et al.  Designing Interactions for Robot Active Learners , 2010, IEEE Transactions on Autonomous Mental Development.

[93]  Eric L. Sauser,et al.  Tactile guidance for policy refinement and reuse , 2010, 2010 IEEE 9th International Conference on Development and Learning.

[94]  P. Gerbarg,et al.  Psychophysiology , 2021, Handbook of Research on Evidence-Based Perspectives on the Psychophysiology of Yoga and Its Applications.

[95]  Andrea Lockerd Thomaz,et al.  Policy Shaping with Human Teachers , 2015, IJCAI.

[96]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[97]  Marc Rigter,et al.  A Framework for Learning From Demonstration With Minimal Human Effort , 2020, IEEE Robotics and Automation Letters.

[98]  C. Busch,et al.  Demographic Bias in Biometrics: A Survey on an Emerging Challenge , 2020, IEEE Transactions on Technology and Society.

[99]  Eric Horvitz,et al.  Selective Supervision: Guiding Supervised Learning with Decision-Theoretic Active Learning , 2007, IJCAI.

[100]  Dorsa Sadigh,et al.  Learning Reward Functions by Integrating Human Demonstrations and Preferences , 2019, Robotics: Science and Systems.

[101]  Ashok K. Goel,et al.  Human-Guided Object Mapping for Task Transfer , 2018, ACM Transactions on Human-Robot Interaction.

[102]  Andrea Lockerd Thomaz,et al.  Human-guided Trajectory Adaptation for Tool Transfer , 2019, AAMAS.

[103]  Anca D. Dragan,et al.  Pragmatic-Pedagogic Value Alignment , 2017, ISRR.

[104]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[105]  Scott Niekum,et al.  Learning grounded finite-state representations from unstructured demonstrations , 2015, Int. J. Robotics Res..

[106]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[107]  Dorsa Sadigh,et al.  Asking Easy Questions: A User-Friendly Approach to Active Reward Learning , 2019, CoRL.

[108]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[109]  Mohamed Chetouani,et al.  Advice Contextual advice General advice General constraints General instructions Guidance Feedback Contextual instructions Corrective feedback Evaluative feedback , 2020 .

[110]  Mausam,et al.  Active Learning with Unbalanced Classes and Example-Generation Queries , 2018, HCOMP.

[111]  Yuchen Cui,et al.  The EMPATHIC Framework for Task Learning from Implicit Human Feedback , 2020, CoRL.

[112]  Aude Billard,et al.  Online learning of varying stiffness through physical human-robot interaction , 2012, 2012 IEEE International Conference on Robotics and Automation.

[113]  Michael A. Goodrich,et al.  Report on the First International Conference on Human-Robot Interaction (HRI) , 2006, AI Mag..

[114]  Esther Rolf,et al.  Delayed Impact of Fair Machine Learning , 2018, ICML.

[115]  Andrea Lockerd Thomaz,et al.  Robot Learning from Human Teachers , 2014, Robot Learning from Human Teachers.