KnAC: an approach for enhancing cluster analysis with background knowledge and explanations

Pattern discovery in multidimensional data sets has been a subject of research since decades. There exists a wide spectrum of clustering algorithms that can be used for that purpose. However, their practical applications share in common the post-clustering phase, which concerns expert-based interpretation and analysis of the obtained results. We argue that this can be a bottleneck of the process, especially in the cases where domain knowledge exists prior to clustering. Such a situation requires not only a proper analysis of automatically discovered clusters, but also a conformance checking with existing knowledge. In this work, we present Knowledge Augmented Clustering (KNAC), which main goal is to confront expert-based labelling with automated clustering for the sake of updating and refining the former. Our solution does not depend on any ready clustering algorithm, nor introduce one. Instead KNAC can serve as an augmentation of an arbitrary clustering algorithm, making the approach robust and model-agnostic. We demonstrate the feasibility of our method on artificially, reproducible examples and on a real life use case scenario.

[1]  Homa Karimabadi,et al.  Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features , 2018, ArXiv.

[2]  Xianghua Xie,et al.  TimeCluster: dimension reduction applied to temporal data for visual analytics , 2019, The Visual Computer.

[3]  Hendrik Blockeel,et al.  COBRAS: Fast, Iterative, Active Clustering with Pairwise Constraints , 2018, ArXiv.

[4]  Steven Schockaert,et al.  Learning Conceptual Space Representations of Interrelated Concepts , 2018, IJCAI.

[5]  Marcin Szpyrka,et al.  Conformance Checking of a Longwall Shearer Operation Based on Low-Level Events , 2020, Energies.

[6]  Kay Giesecke,et al.  Explainable Clustering and Application to Wealth Management Compliance , 2019 .

[7]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[8]  S. Horvath,et al.  Unsupervised Learning With Random Forest Predictors , 2006 .

[9]  Megha Khosla,et al.  Finding Interpretable Concept Spaces in Node Embeddings using Knowledge Bases , 2019, PKDD/ECML Workshops.

[10]  Liyan Zhang,et al.  Context-assisted face clustering framework with human-in-the-loop , 2014, International Journal of Multimedia Information Retrieval.

[11]  Cyrus Rashtchian,et al.  ExKMC: Expanding Explainable k-Means Clustering , 2020, ArXiv.

[12]  Enrico Motta,et al.  Data Patterns Explained with Linked Data , 2015, ECML/PKDD.

[13]  Grzegorz J. Nalepa,et al.  Augmenting Automatic Clustering with Expert Knowledge and Explanations , 2021, ICCS.

[14]  Grzegorz J. Nalepa,et al.  Introducing Uncertainty into Explainable AI Methods , 2021, ICCS.

[15]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[16]  Heike Adel,et al.  ExCut: Explainable Embedding-Based Clustering over Knowledge Graphs , 2020, SEMWEB.

[17]  Chris North,et al.  Observation-Level Interaction with Clustering and Dimension Reduction Algorithms , 2017, HILDA@SIGMOD.

[18]  Germain Forestier,et al.  Collaborative clustering with background knowledge , 2010, Data Knowl. Eng..

[19]  Maria Camila Nardini Barioni,et al.  Semi-supervised clustering using multi-assistant-prototypes to represent each cluster , 2015, SAC.

[20]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Meena Nagarajan,et al.  A Method to Accelerate Human in the Loop Clustering , 2017, SDM.

[23]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[24]  M. de Rijke,et al.  Explainable User Clustering in Short Text Streams , 2016, SIGIR.

[25]  José Francisco Martínez Trinidad,et al.  Mining patterns for clustering on numerical datasets using unsupervised decision trees , 2015, Knowl. Based Syst..

[26]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[27]  Wil M. P. van der Aalst,et al.  Conformance checking of processes based on monitoring real behavior , 2008, Inf. Syst..

[28]  Friedrich Kruber,et al.  An Unsupervised Random Forest Clustering Technique for Automatic Traffic Scenario Categorization , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[29]  Cyrus Rashtchian,et al.  Explainable k-Means and k-Medians Clustering , 2020, ICML.

[30]  Guoyin Wang,et al.  An active three-way clustering method via low-rank matrices for multi-view data , 2020, Inf. Sci..

[31]  Birgit Kirsch,et al.  Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems , 2019 .

[32]  Enrico Motta,et al.  Dedalo: Looking for Clusters Explanations in a Labyrinth of Linked Data , 2014, ESWC.

[33]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[34]  Grzegorz J. Nalepa,et al.  HEARTDROID—Rule engine for mobile and context‐aware expert systems , 2018, Expert Syst. J. Knowl. Eng..

[35]  Marco Mellia,et al.  EXPLAIN-IT: Towards Explainable AI for Unsupervised Network Traffic Analysis , 2019, Big-DAMA@CoNEXT.

[36]  Joydeep Ghosh,et al.  Combining clustering and active learning for the detection and learning of new image classes , 2019, Neurocomputing.

[37]  Zsolt Kira,et al.  Learning to cluster in order to Transfer across domains and tasks , 2017, ICLR.

[38]  Cluster Discovery from Sensor Data Incorporating Expert Knowledge , 2020, KR4L@ECAI.

[39]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[40]  Jure Leskovec,et al.  Embedding Logical Queries on Knowledge Graphs , 2018, NeurIPS.

[41]  Zenglin Xu,et al.  Semi-supervised deep embedded clustering , 2019, Neurocomputing.

[42]  Duc Truong Pham,et al.  Human-robot collaboration in disassembly for sustainable manufacturing , 2019, Int. J. Prod. Res..

[43]  Kay Giesecke,et al.  Computationally Efficient Feature Significance and Importance for Machine Learning Models , 2019, ArXiv.

[44]  Jörn Lötsch,et al.  Interpretation of cluster structures in pain‐related phenotype data using explainable artificial intelligence (XAI) , 2020, European journal of pain.

[45]  Gabriel Erion,et al.  Explainable AI for Trees: From Local Explanations to Global Understanding , 2019, ArXiv.

[46]  Joydeep Ghosh,et al.  C 3E: A Framework for Combining Ensembles of Classifiers and Clusterers , 2011, MCS.

[47]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[48]  Jesús Ariel Carrasco-Ochoa,et al.  An Explainable Artificial Intelligence Model for Clustering Numerical Databases , 2020, IEEE Access.

[49]  Carey E. Priebe,et al.  Geodesic Learning via Unsupervised Decision Forests , 2019, ArXiv.