OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this paper, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integrates an ensemble OoD detection method and a grid-based visualization. The detection method is improved from deep ensembles by combining more features with algorithms in the same family. To better analyze and understand the OoD samples in context, we have developed a novel kNN-based grid layout algorithm motivated by Hall's theorem. The algorithm approximates the optimal layout and has O(kN2) time complexity, faster than the grid layout algorithm with overall best performance but O(N3) time complexity. Quantitative evaluation and case studies were performed on several datasets to demonstrate the effectiveness and usefulness of OoDAnalyzer.

[1]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[2]  Kristian Kersting,et al.  Beyond 2D-grids: a dependence maximization view on image browsing , 2010, MIR '10.

[3]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[4]  Daniel A. Keim,et al.  Bridging Text Visualization and Mining: A Task-Driven Survey , 2019, IEEE Transactions on Visualization and Computer Graphics.

[5]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[6]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[7]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Adrian E. Raftery,et al.  Bayesian Model Averaging: A Tutorial , 2016 .

[9]  Kibok Lee,et al.  Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples , 2017, ICLR.

[10]  Jun Zhu,et al.  Analyzing the Training Processes of Deep Generative Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[13]  Ching-Yung Lin,et al.  TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communication Systems , 2016, IEEE Transactions on Visualization and Computer Graphics.

[14]  Thomas Ertl,et al.  Spatiotemporal anomaly detection through visual analysis of geolocated Twitter messages , 2012, 2012 IEEE Pacific Visualization Symposium.

[15]  Michael Gleicher,et al.  Considerations for Visualizing Comparison , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[17]  Jonathan C. Roberts,et al.  Visual comparison for information visualization , 2011, Inf. Vis..

[18]  Wei Chen,et al.  ViDX: Visual Diagnostics of Assembly Line Performance in Smart Factories , 2017, IEEE Transactions on Visualization and Computer Graphics.

[19]  Robert Givan,et al.  Online Ensemble Learning: An Empirical Study , 2000, Machine Learning.

[20]  Kim Marriott,et al.  Memorability of Visual Features in Network Diagrams , 2012, IEEE Transactions on Visualization and Computer Graphics.

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[23]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[24]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[25]  Hiroyuki Kurata,et al.  A grid layout algorithm for automatic drawing of biochemical networks , 2005, Bioinform..

[26]  Masao Nagasaki,et al.  Fast Grid Layout Algorithm for Biological Networks with Sweep Calculation , 2022 .

[27]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Mengchen Liu,et al.  A survey on information visualization: recent advances and challenges , 2014, The Visual Computer.

[30]  Changjian Chen,et al.  An Interactive Method to Improve Crowdsourced Annotations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[31]  Stephen DiVerdi,et al.  IsoMatch: Creating Informative Grid Layouts , 2015, Comput. Graph. Forum.

[32]  Gennady L. Andrienko,et al.  Steering data quality with visual analytics: The complexity challenge , 2018, Vis. Informatics.

[33]  Jun Yuan,et al.  Visual Genealogy of Deep Neural Networks , 2020, IEEE Transactions on Visualization and Computer Graphics.

[34]  Ross Maciejewski,et al.  A Visual Analytics Framework for Spatiotemporal Trade Network Analysis , 2019, IEEE Transactions on Visualization and Computer Graphics.

[35]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[36]  Szymon Rusinkiewicz,et al.  Learning Local Descriptors With a CDF-Based Dynamic Soft Margin , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Brian D. Fisher,et al.  Pair Analytics: Capturing Reasoning Processes in Collaborative Visual Analytics , 2011, 2011 44th Hawaii International Conference on System Sciences.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Klaus Mueller,et al.  A Visual Analytics Framework for the Detection of Anomalous Call Stack Trees in High Performance Computing Applications , 2019, IEEE Transactions on Visualization and Computer Graphics.

[40]  Karsten Klein,et al.  High-Quality Ultra-Compact Grid Layout of Grouped Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[41]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[42]  Yu-Ru Lin,et al.  Z-Glyph: Visualizing outliers in multivariate data , 2018, Inf. Vis..

[43]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[44]  Klaus Schöffmann,et al.  Similarity-Based Visualization for Image Browsing Revisited , 2011, 2011 IEEE International Symposium on Multimedia.

[45]  Paolo Toth,et al.  Algorithms and codes for dense assignment problems: the state of the art , 2000, Discret. Appl. Math..

[46]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[47]  Eric Horvitz,et al.  Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration , 2016, AAAI.

[48]  Yu-Ru Lin,et al.  Voila: Visual Anomaly Detection and Monitoring with Streaming Spatiotemporal Data , 2018, IEEE Transactions on Visualization and Computer Graphics.

[49]  Leland Wilkinson,et al.  Visualizing Big Data Outliers Through Distributed Aggregation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[50]  J. A. Bondy,et al.  Graph Theory with Applications , 1978 .

[51]  Min Chen,et al.  Analyzing high-dimensional multivaríate network links with integrated anomaly detection, highlighting and exploration , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[52]  Hong Wang,et al.  Exploring Evolving Media Discourse Through Event Cueing , 2016, IEEE Transactions on Visualization and Computer Graphics.

[53]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[54]  Elwood S. Buffa,et al.  Graph Theory with Applications , 1977 .

[55]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[56]  Yale Song,et al.  #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media , 2014, IEEE Transactions on Visualization and Computer Graphics.

[57]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[58]  Yang Chen,et al.  Interactive Correction of Mislabeled Training Data , 2019, 2019 IEEE Conference on Visual Analytics Science and Technology (VAST).

[59]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[60]  Shixia Liu,et al.  Recent research advances on interactive machine learning , 2018, J. Vis..