Threshy: supporting safe usage of intelligent web services

Increased popularity of ‘intelligent’ web services provides end-users with machine-learnt functionality at little effort to developers. However, these services require a decision threshold to be set which is dependent on problem-specific data. Developers lack a systematic approach for evaluating intelligent services and existing evaluation tools are predominantly targeted at data scientists for pre-development evaluation. This paper presents a workflow and supporting tool, Threshy, to help software developers select a decision threshold suited to their problem domain. Unlike existing tools, Threshy is designed to operate in multiple workflows including pre-development, pre-release, and support. Threshy is designed for tuning the confidence scores returned by intelligent web services and does not deal with hyper-parameter optimisation used in ML models. Additionally, it considers the financial impacts of false positives. Threshold configuration files exported by Threshy can be integrated into client applications and monitoring infrastructure. Demo: https://bit.ly/2YKeYhE.

[1]  Jan Bosch,et al.  Software Engineering Challenges of Deep Learning , 2018, 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA).

[2]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[3]  James A. Landay,et al.  Investigating statistical machine learning as a tool for software development , 2008, CHI.

[4]  Nick Wilson,et al.  Just ask Siri? A pilot study comparing smartphone digital assistants and laptop Google searches for smoking cessation advice , 2018, PloS one.

[5]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[6]  D. Sculley,et al.  Detecting adversarial advertisements in the wild , 2011, KDD.

[7]  D. Sculley,et al.  Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.

[8]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[9]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[10]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[11]  Mohamed Abdelrazek,et al.  Beware the evolving ‘intelligent’ web service! an integration architecture tactic to guard AI-first components , 2020, ESEC/SIGSOFT FSE.

[12]  Jichen Zhu,et al.  Patterns for How Users Overcome Obstacles in Voice User Interfaces , 2018, CHI.

[13]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[14]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[15]  Robert C. Holte,et al.  Cost curves: An improved method for visualizing classifier performance , 2006, Machine Learning.

[16]  Andrew Cain,et al.  Losing Confidence in Quality: Unspoken Evolution of Computer Vision Services , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).