Adaptive graph-based algorithms for conditional anomaly detection and semi-supervised learning

We develop graph-based methods for semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. We also present graph-based methods for detecting conditional anomalies and apply them to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care.

[1]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[2]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[3]  Junshui Ma,et al.  Online novelty detection on temporal sequences , 2003, KDD '03.

[4]  D. Gans,et al.  Medical groups' adoption of electronic health records and information systems. , 2005, Health affairs.

[5]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[6]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[7]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[8]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[9]  D. Hand,et al.  Bayesian anomaly detection methods for social networks , 2010, 1011.1788.

[10]  Christian Posse,et al.  Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction , 2002, Data Mining and Knowledge Discovery.

[11]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[12]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[13]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[14]  Sanjay Ranka,et al.  Conditional Anomaly Detection , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[16]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[17]  Bart M. ter Haar Romeny,et al.  The Gaussian kernel , 2003 .

[18]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[19]  Roberto Alejo,et al.  Analysis of new techniques to obtain quality training sets , 2003, Pattern Recognit. Lett..

[20]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[21]  Milos Hauskrecht,et al.  Feature importance analysis for patient management decisions , 2010, MedInfo.

[22]  Matthias W. Seeger,et al.  Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[23]  Pang-Ning Tan,et al.  Outlier Detection Using Random Walks , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[24]  Zhi-Hua Zhou,et al.  Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[25]  Andrew R. Post,et al.  Temporal data mining. , 2008, Clinics in laboratory medicine.

[26]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[27]  L. Hayden,et al.  Ten Commandments for Effective Clinical Decision Support: Making the Practice of Evidence-based Medicine a Reality , 2011 .

[28]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.

[29]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[30]  Ling Huang,et al.  Semi-Supervised Learning with Max-Margin Graph Cuts , 2010, AISTATS.

[31]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[33]  Werner Dubitzky,et al.  Fundamentals of Data Mining in Genomics and Proteomics , 2009 .

[34]  Petros Drineas,et al.  On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..

[35]  Milos Hauskrecht,et al.  Conditional anomaly detection methods for patient-management alert systems. , 2008, Proceedings of the ... International Conference on Machine Learning. International Conference on Machine Learning.

[36]  B. Kveton,et al.  Conditional Anomaly Detection Using Soft Harmonic Functions: An Application to Clinical Alerting , 2011, ICML 2011.

[37]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[38]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[39]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[40]  Jingrui He,et al.  Graph-Based Semi-Supervised Learning as a Generative Model , 2007, IJCAI.

[41]  Jeff Schneider,et al.  Detecting patterns of anomalies , 2009 .

[42]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[43]  Thanasis Hadzilacos,et al.  Advances in Spatial and Temporal Databases: 8th International Symposium, SSTD 2003, Santorini Island, Greece, July 24 - 27, 2003. Proceedings , 2003 .

[44]  Andrei Zinovyev,et al.  Principal Graphs and Manifolds , 2010 .

[45]  Nicolas Le Roux,et al.  Efficient Non-Parametric Function Induction in Semi-Supervised Learning , 2004, AISTATS.

[46]  Yannick Assogba,et al.  Detecting outlier sections in us congressional legislation , 2011, SIGIR.

[47]  Xiaojin Zhu,et al.  Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning , 2005, ICML.

[48]  C. A. Murthy,et al.  Density-Based Multiscale Data Condensation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[50]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[51]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[52]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[53]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[54]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[55]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[56]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[57]  Ling Huang,et al.  Online Semi-Supervised Learning on Quantized Graphs , 2010, UAI.

[58]  Ming Li,et al.  Online Manifold Regularization: A New Learning Setting and Empirical Study , 2008, ECML/PKDD.

[59]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[60]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[61]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[62]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[63]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[64]  Robert D. Nowak,et al.  Multi-Manifold Semi-Supervised Learning , 2009, AISTATS.

[65]  Jonathon T. Giffin,et al.  An auctioning reputation system based on anomaly , 2005, CCS '05.

[66]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[67]  Ling Huang,et al.  Semi-Supervised Perception : Real-Time Learning without Explicit Feedback , 2010 .

[68]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[69]  Christos Faloutsos,et al.  Cross-Outlier Detection , 2003, SSTD.

[70]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[71]  David W. Bates,et al.  Review Paper: What Evidence Supports the Use of Computerized Alerts and Prompts to Improve Clinicians' Prescribing Behavior? , 2009, J. Am. Medical Informatics Assoc..

[72]  Zeeshan Syed,et al.  Unsupervised Risk Stratification in Clinical Datasets: Identifying Patients at Risk of Rare Outcomes , 2010, ICML.

[73]  Antonio Torralba,et al.  Semi-Supervised Learning in Gigantic Image Collections , 2009, NIPS.

[74]  Jeff G. Schneider,et al.  Anomaly pattern detection in categorical datasets , 2008, KDD.

[75]  Milos Hauskrecht,et al.  Distance Metric Learning for Conditional Anomaly Detection , 2008, FLAIRS Conference.

[76]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[77]  Larry Wasserman,et al.  Spectral Connectivity Analysis , 2008, 0811.0121.

[78]  Ulrike von Luxburg,et al.  Graph Laplacians and their Convergence on Random Neighborhood Graphs , 2006, J. Mach. Learn. Res..

[79]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[80]  Jun-Ming Xu,et al.  OASIS: Online Active Semi-Supervised Learning , 2011, AAAI.

[81]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[82]  James C. Bezdek,et al.  Some Notes on Alternating Optimization , 2002, AFSS.

[83]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[84]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[85]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[86]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[87]  Rajeev Motwani,et al.  Incremental clustering and dynamic information retrieval , 1997, STOC '97.

[88]  Ling Huang,et al.  Fast approximate spectral clustering , 2009, KDD.

[89]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[90]  Pang-Ning Tan,et al.  Kernel Based Detection of Mislabeled Training Examples , 2007, SDM.

[91]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[92]  Mehryar Mohri,et al.  Stability of transductive regression algorithms , 2008, ICML '08.

[93]  Gregory F Cooper,et al.  Conditional outlier detection for clinical alerting. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.