Evaluating machine learning algorithms for applications with humans in the loop

Applications employing data classification such as smart lighting that involve human factors such as perception lead to non-deterministic input-output relationships where more than one output may be acceptable for a given input. For these so called non-deterministic multiple output classification (nDMOC) problems, the relationship between the input and output may change over time making it difficult for the machine learning (ML) algorithms in a batch setting to make predictions for a given context. In this paper, we describe the nature of nDMOC problems and discuss the Relevance Score (RS) that is suitable in this context as a performance metric. RS determines the extent by which a predicted output is relevant to the user's context and behaviors, taking into account the inconsistencies that come with human (perception) factors. We tailor the RS metric so that it can be used to evaluate ML algorithms in an online setting at run-time. We assess the performance of a number of ML algorithms, using a smart lighting dataset with non-deterministic one-to-many input-output relationships. The results indicate that using RS instead of classification accuracy (CA) is suitable to analyze the performance of conventional ML algorithms applied to the category of nDMOC problems. Instance-based online ML gives the best RS performance. An interesting finding is that the RS keeps increasing with increasing number of samples, even after the CA performance converges.

[1]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[2]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  U. Gneezy,et al.  Journal of Economic Perspectives—Volume 25, Number 4—Fall 2011—Pages 191–210 When and Why Incentives (Don’t) Work to Modify Behavior , 2022 .

[4]  Roger N. Shepard,et al.  Psychological relations and psychophysical scales: On the status of “direct” psychophysical measurement ☆ , 1981 .

[5]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[6]  Antonio Liotta,et al.  The Value of Relative Quality in Video Delivery , 2011, J. Mobile Multimedia.

[7]  N. Japkowicz Why Question Machine Learning Evaluation Methods ? ( An illustrative review of the shortcomings of current methods ) , 2006 .

[8]  Antonio Liotta,et al.  Relevance in cyber‐physical systems with humans in the loop , 2017, Concurr. Comput. Pract. Exp..

[9]  Americus Reed,et al.  Testing a social-cognitive model of moral behavior: the interactive influence of situations and moral identity centrality. , 2009, Journal of personality and social psychology.

[10]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[11]  Andrew B. Watson,et al.  Measurement of visual impairment scales for digital video , 2001, IS&T/SPIE Electronic Imaging.

[12]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[13]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[14]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[15]  Antonio Liotta,et al.  Adaptive psychometric scaling for video quality assessment , 2012, Signal Process. Image Commun..

[16]  M. Aly Survey on Multiclass Classification Methods , 2005 .

[17]  Claudia Perlich,et al.  Learning Curves in Machine Learning , 2010, Encyclopedia of Machine Learning.

[18]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[19]  A. Liotta The cognitive NET is coming , 2013, IEEE Spectrum.

[20]  Antonio Liotta,et al.  Relevance as a Metric for Evaluating Machine Learning Algorithms , 2013, MLDM.

[21]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[22]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[23]  Antonio Liotta,et al.  Statistical Inference for Intelligent Lighting: A Pilot Study , 2014, IDC.

[24]  Maria Torres Vega,et al.  Predictive no-reference assessment of video quality , 2016, Signal Process. Image Commun..

[25]  Antonio Liotta,et al.  Exploiting machine learning for intelligent room lighting applications , 2012, 2012 6th IEEE International Conference Intelligent Systems.