Higher Order Neural Networks and Neural Networks for Stream Learning

The goal of this thesis is to explore some variations of neural networks. The thesis is mainly split into two parts: a variation of the shaping functions in neural networks and a variation of learning rules in neural networks. In the first part, we mainly investigate polynomial perceptrons a perceptron with a polynomial shaping function instead of a linear one. We prove the polynomial perceptron convergence theorem and illustrate the notion by showing that a higher order perceptron can learn the XOR function through empirical experiments with implementation. In the second part, we propose three models (SMLP, SA, SA2) for stream learning and anomaly detection in streams. The main technique allowing these models to perform at a level comparable to the state-of-the-art algorithms in stream learning is the learning rule used. We employ mini-batch gradient descent algorithm and stochastic gradient descent algorithm to speed up the models. In addition, the use of parallel processing with multi-threads makes the proposed methods highly efficient in dealing with streaming data. Our analysis shows that all models have linear runtime and constant memory requirement. We also demonstrate empirically that the proposed methods feature high detection rate, low false alarm rate, and fast response. The paper on the first two models (SMLP, SA) is published in the 29th Canadian AI Conference and won the best paper award. The invited journal paper on the third model (SA2) for Computational Intelligence is under peer review.

[1]  Training Hidden Units: the Generalized Delta Rule , .

[2]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[3]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[4]  Shen Furao,et al.  Self-Organizing Incremental Neural Network and Its Application , 2010, ICANN.

[5]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[6]  Niall M. Adams,et al.  The impact of changing populations on classifier performance , 1999, KDD '99.

[7]  Miriam A. M. Capretz,et al.  Contextual anomaly detection framework for big sensor data , 2015, Journal of Big Data.

[8]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[9]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[10]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[11]  Shen Furao,et al.  An enhanced self-organizing incremental neural network for online unsupervised learning , 2007, Neural Networks.

[12]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[13]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[16]  Nathalie Japkowicz,et al.  A Novelty Detection Approach to Classification , 1995, IJCAI.

[17]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[18]  Hubert Haoyang Duan,et al.  Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease , 2014, ArXiv.

[19]  Ming Zhang,et al.  Applied Artificial Higher Order Neural Networks for Control and Recognition , 2016 .

[20]  Jacek M. Zurada,et al.  Introduction to artificial neural systems , 1992 .

[21]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[22]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[23]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[24]  David S. Moore,et al.  The Basic Practice of Statistics [With CDROM] , 1999 .

[25]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[26]  Mordechai Katzman Counting Monomials , 2005 .

[27]  D. Brzezinski MINING DATA STREAMS WITH CONCEPT DRIFT , 2010 .

[28]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[29]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[30]  Gaia Maselli Design and Implementation of an Anomaly Detection System: an Empirical Approach , 2003 .

[31]  Philip S. Yu,et al.  RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection , 2014, 2014 IEEE International Conference on Data Mining.

[32]  Hao Huang,et al.  Streaming Anomaly Detection Using Randomized Matrix Sketching , 2015, Proc. VLDB Endow..

[33]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[34]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[35]  Salvatore J. Stolfo,et al.  One Class Support Vector Machines for Detecting Anomalous Windows Registry Accesses , 2003 .

[36]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[37]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[38]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[39]  Jianfeng Feng,et al.  Learning in a higher-order simple perceptron , 1999 .

[40]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[41]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[42]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[43]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[44]  Yue Dong,et al.  Threaded Ensembles of Supervised and Unsupervised Neural Networks for Stream Learning , 2016, Canadian Conference on AI.

[45]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[46]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[47]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[48]  Vladimir Pestov,et al.  Is the kk-NN classifier in high dimensions affected by the curse of dimensionality? , 2011, Comput. Math. Appl..

[49]  Kai Ming Ting,et al.  Fast Anomaly Detection for Streaming Data , 2011, IJCAI.

[50]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[51]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[52]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.