Unsupervised deep embedding for novel class detection over data stream

Data streams are continuous flows of data points. Novel class detection is an important part of data stream mining. A novel class is a newly emerged class that has not previously been modeled by the classifier over the input stream. This paper proposes deep embedding for novel class detection — a novel approach that combines feature learning using denoising autoencoding with novel class detection. A denoising autoencoder is a neural network with hidden layers aiming to reconstruct the input vector from a corrupted version. A nonparametric multidimensional change point detection approach is also proposed, to detect concept-drift (the change of data feature values over time). Experiments on several real datasets show that the approach significantly improves the performance of novel class detection.

[1]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[2]  Bhavani M. Thuraisingham,et al.  P2V: Effective Website Fingerprinting Using Vector Space Representations , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[3]  Stefan Katzenbeisser,et al.  From Patches to Honey-Patches: Lightweight Attacker Misdirection, Deception, and Disinformation , 2014, CCS.

[4]  Bhavani M. Thuraisingham,et al.  Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints , 2011, IEEE Transactions on Knowledge and Data Engineering.

[5]  Yuuki Tachioka,et al.  Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[7]  Charu C. Aggarwal,et al.  Recurring and Novel Class Detection Using Class-Based Ensemble for Evolving Data Stream , 2016, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jerzy Stefanowski,et al.  Combining block-based and online methods in learning ensembles from concept drifting data streams , 2014, Inf. Sci..

[9]  Xing Chen,et al.  Stacked Denoise Autoencoder Based Feature Extraction and Classification for Hyperspectral Images , 2016, J. Sensors.

[10]  Arjun K. Gupta,et al.  Parametric Statistical Change Point Analysis , 2000 .

[11]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[12]  Bhavani M. Thuraisingham,et al.  Adaptive encrypted traffic fingerprinting with bi-directional dependence , 2016, ACSAC.

[13]  Michael Baron,et al.  Nonparametric adaptive change point estimation and on line detection , 2000 .

[14]  Takehisa Yairi,et al.  Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction , 2014, MLSDA'14.

[15]  Xindong Wu,et al.  Combining proactive and reactive predictions for data streams , 2005, KDD '05.

[16]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[17]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[18]  Feng Liu,et al.  Auto-encoder Based Data Clustering , 2013, CIARP.

[19]  Oleksiy Mazhelis,et al.  One-class classifiers : a review and analysis of suitability in the context of mobile-masquerader detection , 2006, South Afr. Comput. J..

[20]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[21]  Charu C. Aggarwal,et al.  Detecting Recurring and Novel Classes in Concept-Drifting Data Streams , 2011, 2011 IEEE 11th International Conference on Data Mining.

[22]  Xiangliang Zhang,et al.  A PCA-Based Change Detection Framework for Multidimensional Data Streams: Change Detection in Multidimensional Data Streams , 2015, KDD.

[23]  Yoshua Bengio,et al.  Marginalized Denoising Auto-encoders for Nonlinear Representations , 2014, ICML.

[24]  Charu C. Aggarwal,et al.  An Adaptive Framework for Multistream Classification , 2016, CIKM.