cSmartML-Glassbox: Increasing Transparency and Controllability in Automated Clustering

Machine learning algorithms have been widely employed in various applications and fields. Novel technologies in automated machine learning (AutoML) ease algorithm selection and hyperparameter optimization complexity. AutoML frame-works have achieved notable success in hyperparameter tuning and surpassed the performance of human experts. However, depending on such frameworks as black-box can leave machine learning practitioners without insights into the inner working of the AutoML process and hence influence their trust in the models produced. In addition, excluding humans from the loop creates several limitations. For example, most of the current AutoML frameworks ignore the user preferences on defining or controlling the search space, which consequently can impact the performance of the models produced and the acceptance of these models by the end-users. The research in the area of transparency and controllability of AutoML has attracted much interest lately, both in academia and industry. However, existing tools are usually restricted to supervised learning tasks such as classification and regression, while unsupervised learning, particularly clustering, remains a largely unexplored problem. Motivated by these shortcomings, we design and implement cSmartML-GlassBox, an interactive visualization tool that en-ables users to refine the search space of AutoML and analyze the results. cSmartML-GlassBox is equipped with a recommendation engine to recommend a time budget that is likely adequate for a new dataset to obtain well-performing pipeline. In addition, the tool supports multi-granularity visualization to enable machine learning practitioners to monitor the AutoML process, analyze the explored configurations and refine/control the search space. Furthermore, cSmartML-GlassBox is equipped with a logging mechanism such that repeated runs on the same dataset can be more effective by avoiding evaluating the same previously considered configurations. We demonstrate the effectiveness and usability of the cSmartML-GlassBox through a user evaluation study with 23 participants and an expert-based usability study based on four experts. We find that the proposed tool increases users' understanding and trust in the AutoML frameworks.

[1]  S. Sakr,et al.  Exploiting time series of Sentinel-1 and Sentinel-2 to detect grassland mowing events using deep learning with reject region , 2022, Scientific reports.

[2]  J. Vanschoren,et al.  Meta-features for meta-learning , 2022, Knowl. Based Syst..

[3]  Radwa Elshawi,et al.  cSmartML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Clustering , 2021, 2021 IEEE International Conference on Big Data (Big Data).

[4]  Sherif Sakr,et al.  DLBench: a comprehensive experimental evaluation of deep learning frameworks , 2021, Cluster Computing.

[5]  C. Doulkeridis,et al.  AutoClust: A Framework for Automated Clustering based on Cluster Validity Indices , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[6]  Radwa El Shawi,et al.  D-SmartML: A Distributed Automated Machine Learning Framework , 2020, 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS).

[7]  Sherif Sakr,et al.  A Decision Support Framework for AutoML Systems: A Meta-Learning Approach , 2019, 2019 International Conference on Data Mining Workshops (ICDMW).

[8]  Parikshit Ram,et al.  Human-AI Collaboration in Data Science , 2019, Proc. ACM Hum. Comput. Interact..

[9]  Sherif Sakr,et al.  Interpretability in HealthCare A Comparative Study of Local Machine Learning Interpretability Techniques , 2019, 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS).

[10]  Yolanda Gil,et al.  Towards human-guided machine learning , 2019, IUI.

[11]  Kalyan Veeramachaneni,et al.  ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning , 2019, CHI.

[12]  Inioluwa Deborah Raji,et al.  Model Cards for Model Reporting , 2018, FAT.

[13]  Kush R. Varshney,et al.  Increasing Trust in AI Services through Supplier's Declarations of Conformity , 2018, ArXiv.

[14]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[15]  Andreas Holzinger,et al.  Interactive machine learning for health informatics: when do we need the human-in-the-loop? , 2016, Brain Informatics.

[16]  Thomas G. Dietterich,et al.  Facilitating testing and debugging of Markov Decision Processes with interactive visualization , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[17]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[18]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[19]  Leandro Nunes de Castro,et al.  Clustering algorithm selection by meta-learning systems: A new distance-based problem characterization and ranking combination methods , 2015, Inf. Sci..

[20]  M. Cugmas,et al.  On comparing partitions , 2015 .

[21]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[24]  Alexander Schliep,et al.  Ranking and selecting clustering algorithms using a meta-learning approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[25]  Anil K. Jain Data Clustering: User's Dilemma , 2007, MLDM.

[26]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[28]  Anil K. Jain,et al.  Data Clustering: A User's Dilemma , 2005, PReMI.

[29]  Joshua D. Knowles,et al.  Improvements to the scalability of multiobjective clustering , 2005, 2005 IEEE Congress on Evolutionary Computation.

[30]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[31]  Ujjwal Maulik,et al.  Validity index for crisp and fuzzy clusters , 2004, Pattern Recognit..

[32]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  George Karypis,et al.  CLUTO - A Clustering Toolkit , 2002 .

[34]  Michalis Vazirgiannis,et al.  Clustering validity assessment: finding the optimal partitioning of a data set , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[35]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[36]  Paul A. Beardsley,et al.  Design galleries: a general approach to setting parameters for computer graphics and animation , 1997, SIGGRAPH.

[37]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[38]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[39]  Holger Schwarz,et al.  AutoML4Clust: Efficient AutoML for Clustering Analyses , 2021, EDBT.

[40]  Yue Liu,et al.  AutoCluster: Meta-learning Based Ensemble Method for Automated Unsupervised Clustering , 2021, PAKDD.

[41]  Radwa Elshawi,et al.  iSmartML: An Interactive and User-Guided Framework for Automated Machine Learning , 2020 .

[42]  Sherif Sakr,et al.  SmartML: A Meta Learning-Based Framework for Automated Selection and Hyperparameter Tuning for Machine Learning Algorithms , 2019, EDBT.

[43]  Aditya G. Parameswaran,et al.  A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead , 2019, IEEE Data Eng. Bull..

[44]  Jaegul Choo,et al.  VISUALHYPERTUNER: VISUAL ANALYTICS FOR USER-DRIVEN HYPERPARAMTER TUNING OF DEEP NEURAL NETWORKS , 2019 .

[45]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[46]  Andrey Filchenkov,et al.  Meta-learning System for Automated Clustering , 2017, AutoML@PKDD/ECML.

[47]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[48]  Siddheswar Ray,et al.  Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation , 2000 .

[49]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .