Global tests for novelty

Outlier detection covers the wide range of methods aiming at identifying observations that are considered unusual. Novelty detection, on the other hand, seeks observations among newly generated test data that are exceptional compared with previously observed training data. In many applications, the general existence of novelty is of more interest than identifying the individual novel observations. For instance, in high-throughput cancer treatment screening experiments, it is meaningful to test whether any new treatment effects are seen compared with existing compounds. Here, we present hypothesis tests for such global level novelty. The problem is approached through a set of very general assumptions, making it innovative in relation to the current literature. We introduce test statistics capable of detecting novelty. They operate on local neighborhoods and their null distribution is obtained by the permutation principle. We show that they are valid and able to find different types of novelty, e.g. location and scale alternatives. The performance of the methods is assessed with simulations and with applications to real data sets.

[1]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[2]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[3]  Jiayang Sun,et al.  ASYMPTOTIC RELATIONSHIPS BETWEEN THE D-TEST AND LIKELIHOOD RATIO-TYPE TESTS FOR HOMOGENEITY , 2010 .

[4]  J. Kalbfleisch,et al.  A modified likelihood ratio test for homogeneity in finite mixture models , 2001 .

[5]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[6]  Luís Torgo,et al.  Data Mining with R: Learning with Case Studies , 2010 .

[7]  J. I The Design of Experiments , 1936, Nature.

[8]  Sameer Singh,et al.  Novelty detection: a review - part 2: : neural network based approaches , 2003, Signal Process..

[9]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[10]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[12]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[13]  Brian D. Ripley,et al.  Modern applied statistics with S, 4th Edition , 2002, Statistics and computing.

[14]  Lionel Tarassenko,et al.  The use of novelty detection techniques for monitoring high-integrity plant , 2002, Proceedings of the International Conference on Control Applications.

[15]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[16]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[17]  Pengfei Li,et al.  Testing the Order of a Finite Mixture , 2010 .

[18]  Gail A. Carpenter,et al.  ARTMAP-FD: familiarity discrimination applied to radar target recognition , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[19]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[20]  E. Pitman Significance Tests Which May be Applied to Samples from Any Populations , 1937 .

[21]  Marcus A. Maloof,et al.  Machine Learning and Data Mining for Computer Security , 2006 .

[22]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[23]  J. Nevalainen,et al.  Morphological Clustering of Cell Cultures Based on Size, Shape, and Texture Features , 2016 .

[24]  Carla E. Brodley,et al.  Behavioral Features for Network Anomaly Detection , 2006 .

[25]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[26]  Jyrki Lötjönen,et al.  Quantification of Dynamic Morphological Drug Responses in 3D Organotypic Cell Cultures by Automated Image Analysis , 2014, PloS one.

[27]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[28]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[29]  J. Lechner,et al.  Establishment and characterization of a human prostatic carcinoma cell line (PC-3). , 1979, Investigative urology.

[30]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.