RCLens: Interactive Rare Category Exploration and Identification

Rare category identification is an important task in many application domains, ranging from network security, to financial fraud detection, to personalized medicine. These are all applications which require the discovery and characterization of sets of rare but structurally-similar data entities which are obscured within a larger but structurally different dataset. This paper introduces RCLens, a visual analytics system designed to support user-guided rare category exploration and identification. RCLens adopts a novel active learning-based algorithm to iteratively identify more accurate rare categories in response to user-provided feedback. The algorithm is tightly integrated with an interactive visualization-based interface which supports a novel and effective workflow for rare category identification. This paper (1) defines RCLens’ underlying active-learning algorithm; (2) describes the visualization and interaction designs, including a discussion of how the designs support user-guided rare category identification; and (3) presents results from an evaluation demonstrating RCLens’ ability to support the rare category identification process.

[1]  Yu-Ru Lin,et al.  Z-Glyph: Visualizing outliers in multivariate data , 2018, Inf. Vis..

[2]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[3]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[4]  Tsuhan Chen,et al.  An active learning framework for content-based information retrieval , 2002, IEEE Trans. Multim..

[5]  Mihai Datcu,et al.  Visualization-Based Active Learning for the Annotation of SAR Images , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[6]  Yale Song,et al.  #FluxFlow: Visual Analysis of Anomalous Information Spreading on Social Media , 2014, IEEE Transactions on Visualization and Computer Graphics.

[7]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[8]  Adetunmbi A. Olusola,et al.  Analysis of KDD '99 Intrusion Detection Dataset for Selection of Relevance Features , 2010 .

[9]  Jingrui He,et al.  Graph-Based Rare Category Detection , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[11]  Jingrui He,et al.  Prior-Free Rare Category Detection , 2009, SDM.

[12]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[13]  Tomoharu Iwata,et al.  Active Learning for Interactive Visualization , 2013, AISTATS.

[14]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[15]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[16]  Ching-Yung Lin,et al.  TargetVue: Visual Analysis of Anomalous User Behaviors in Online Communication Systems , 2016, IEEE Transactions on Visualization and Computer Graphics.

[17]  Hao Huang,et al.  RADAR: Rare Category Detection via Computation of Boundary Degree , 2011, PAKDD.

[18]  Shaogang Gong,et al.  A Unifying Theory of Active Discovery and Learning , 2012, ECCV.

[19]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[20]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[21]  Weng-Keen Wong,et al.  Category detection using hierarchical mean shift , 2009, KDD.

[22]  Zhen Lin,et al.  Choosing SNPs using feature selection , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[23]  Wlodzislaw Duch Filter methods , 2004 .

[24]  Dana Angluin,et al.  Queries revisited , 2001, Theoretical Computer Science.

[25]  Jingrui He,et al.  Rare Category Detection , 2011 .

[26]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[27]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[28]  Hui Xiong,et al.  Local decomposition for rare class analysis , 2007, KDD '07.

[29]  Hao Huang,et al.  Prior-free rare category detection: More effective and efficient solutions , 2014, Expert Syst. Appl..

[30]  Alfred Inselberg,et al.  Parallel Coordinates: Interactive Visualisation for High Dimensions , 2009 .

[31]  Shlomo Argamon,et al.  Committee-Based Sampling For Training Probabilistic Classi(cid:12)ers , 1995 .

[32]  Yunjun Gao,et al.  Rare category exploration , 2014, Expert Syst. Appl..

[33]  Allen Kent,et al.  Machine literature searching X. Machine language; factors underlying its design and development , 1955 .

[34]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[35]  Tao Xiang,et al.  Finding Rare Classes: Active Learning with Generative and Discriminative Models , 2013, IEEE Transactions on Knowledge and Data Engineering.

[36]  Vikram Krishnamurthy,et al.  Algorithms for optimal scheduling and management of hidden Markov model sensors , 2002, IEEE Trans. Signal Process..

[37]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[38]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[39]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.