Negative selection based anomaly detector for multimodal health data

Early detection of emerging disease outbreaks is crucial to effective containment and response, yet initial outbreak signatures can be difficult to detect with automated methods. Outbreaks may be masked by noisy data, and signs of an outbreak may be hidden across multiple data feeds. Current biosurveillance methods often perform unimodal statistical analyses that are unable to intelligently leverage multiple correlated data of different types while still retaining quantitative sensitivity. In this paper, we propose and implement an anomaly detection system for health data based upon the human immune system. The adaptive immune system operates over a high-dimensional antigen space in a distributed manner, allowing it to efficiently scale without relying on a centralized controller. Our negative selection algorithm based on the immune system provides effective and scalable distributed anomaly detection for biosurveillance. It detects anomalies in the large, complex data from modern health monitoring data feeds with low false positive rates. Our bootstrap aggregation method improves performance on high-dimensional data sets, and we implement a parallelized version of the algorithm to demonstrate the potential to implement it on a scalable distributed architecture. Our negative selection algorithm is able to detect 90% of all outbreaks with a false positive rate of 11.8% in a publicly available multimodal synthetic health record data set. The scalability and performance of the negative selection algorithm demonstrate that immune computation can provide effective approaches for national and global scale biosurveillence.

[1]  Steven Babin,et al.  An integrated approach for fusion of environmental and human health data for disease surveillance , 2011, Statistics in medicine.

[2]  Johannes Textor,et al.  A Comparative Study of Negative Selection Based Anomaly Detection in Sequence Data , 2012, ICARIS.

[3]  Cosmin Safta,et al.  Structural models used in real-time biosurveillance outbreak detection and outbreak curve isolation from noisy background morbidity levels , 2012, J. Am. Medical Informatics Assoc..

[4]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[5]  A. Dobson,et al.  The application of statistical process control charts to the detection and monitoring of hospital-acquired infections. , 2001, Journal of quality in clinical practice.

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  G. J. V. Nossal,et al.  Negative selection of lymphocytes , 1994, Cell.

[8]  Johannes Textor,et al.  Efficient Negative Selection Algorithms by Sampling and Approximate Counting , 2012, PPSN.

[9]  Tao Yang,et al.  A Quick Negative Selection Algorithm for One-Class Classification in Big Data Era , 2017 .

[10]  William H. Woodall,et al.  The Use of Control Charts in Health-Care and Public-Health Surveillance , 2006 .

[11]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[12]  Dipankar Dasgupta,et al.  Artificial immune systems in industrial applications , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[13]  Howard S. Burkom,et al.  A practitioner-driven research agenda for syndromic surveillance , 2017, Public health reports.

[14]  Tao Li,et al.  An antigen space density based real-value negative selection algorithm , 2017, Appl. Soft Comput..

[15]  Rogério de Lemos,et al.  Negative Selection: How to Generate Detectors , 2002 .

[16]  Rogério de Lemos,et al.  Immunising Automated Teller Machines , 2005, ICARIS.

[17]  Claudia Eckert,et al.  Is negative selection appropriate for anomaly detection? , 2005, GECCO '05.

[18]  Julie Greensmith,et al.  The Deterministic Dendritic Cell Algorithm , 2008, ICARIS.

[19]  Andrew W. Moore,et al.  Bayesian Network Anomaly Pattern Detection for Disease Outbreaks , 2003, ICML.

[20]  Atulya K. Nagar,et al.  On the use of innate and adaptive parts of artificial immune systems for online fraud detection , 2010, 2010 IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA).

[21]  Julie Greensmith,et al.  Information fusion for anomaly detection with the dendritic cell algorithm , 2010, Inf. Fusion.

[22]  Galit Shmueli,et al.  Wavelet-Based Monitoring for Biosurveillance , 2013, Axioms.

[23]  Claudia Eckert,et al.  On the Use of Hyperspheres in Artificial Immune Systems as Antibody Recognition Regions , 2006, ICARIS.

[24]  Galit Shmueli,et al.  Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Stephen E. Fienberg,et al.  Current and Potential Statistical Methods for Monitoring Multiple Data Streams for Biosurveillance , 2006 .

[26]  Paul H. Garthwaite,et al.  Statistical methods for the prospective detection of infectious disease outbreaks: a review , 2012 .

[27]  Kaushik Ghosh,et al.  Negative selection algorithm for monitoring processes with large number of variables , 2014, 2014 IEEE Conference on Control Applications (CCA).

[28]  Charles A. Janeway,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:How the immune system works to protect the host from infection: A personal view , 2001 .

[29]  Julie Greensmith,et al.  Dendritic Cells for Anomaly Detection , 2006, 2006 IEEE International Conference on Evolutionary Computation.

[30]  Alan S. Perelson,et al.  Self-nonself discrimination in a computer , 1994, Proceedings of 1994 IEEE Computer Society Symposium on Research in Security and Privacy.

[31]  Tao Li,et al.  A real negative selection algorithm with evolutionary preference for anomaly detection , 2017 .