Evaluation Methodology for Multiclass Novelty Detection Algorithms

Novelty detection is a useful ability for learning systems, especially in data stream scenarios, where new concepts can appear, known concepts can disappear and concepts can evolve over time. There are several studies in the literature investigating the use of machine learning classification techniques for novelty detection in data streams. However, there is no consensus regarding how to evaluate the performance of these techniques, particular for multiclass problems. In this study, we propose a new evaluation approach for multiclass data streams novelty detection problems. This approach is able to deal with: i) multiclass problems, ii) confusion matrix with a column representing the unknown examples, iii) confusion matrix that increases over time, iv) unsupervised learning, that generates novelties without an association with the problem classes and v) representation of the evaluation measures over time. We evaluate the performance of the proposed approach by known novelty detection algorithms with artificial and real data sets.

[1]  Mahmoud Reza Hashemi,et al.  A DCT based approach for detecting novelty and concept drift in data streams , 2010, 2010 International Conference of Soft Computing and Pattern Recognition.

[2]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[3]  Anukool Lakhina,et al.  Multivariate Online Anomaly Detection Using Kernel Recursive Least Squares , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[4]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[5]  Petra Perner,et al.  Concepts for novelty detection and handling based on a case-based reasoning process scheme , 2007, Eng. Appl. Artif. Intell..

[6]  Andrew Rosenberg,et al.  Automatic detection and classification of prosodic events , 2009 .

[7]  D. M. Farid,et al.  Novel class detection in concept-drifting data stream mining employing decision tree , 2012, 2012 7th International Conference on Electrical and Computer Engineering.

[8]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[9]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[10]  Sameer Singh,et al.  Novelty detection: a review - part 1: statistical approaches , 2003, Signal Process..

[11]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[12]  Fabio Roli,et al.  A Classification Approach with a Reject Option for Multi-label Problems , 2011, ICIAP.

[13]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[14]  Yang Zhang,et al.  Support Vector Machine in Novelty Detection for Multi-channel Combustion Data , 2006, ISNN.

[15]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Novelty detection algorithm for data streams multi-class problems , 2013, SAC '13.

[16]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[17]  Blaise Hanczar,et al.  Accuracy-Rejection Curves (ARCs) for Comparing Classification Methods with a Reject Option , 2009, MLSB.

[18]  Eduardo Jaques Spinosa,et al.  Novelty detection with application to data streams , 2009, Intell. Data Anal..

[19]  Claudio Marrocco,et al.  A Framework for Multiclass Reject in ECOC Classification Systems , 2007, SCIA.