Active learning for anomaly detection in environmental data

Abstract Due to the growing amount of data from in-situ sensors in environmental monitoring, it becomes necessary to automatically detect anomalous data points. Nowadays, this is mainly performed using supervised machine learning models, which need a fully labelled data set for their training process. However, the process of labelling data is typically cumbersome and, as a result, a hindrance to the adoption of machine learning methods for automated anomaly detection. In this work, we propose to address this challenge by means of active learning. This method consists of querying the domain expert for the labels of only a selected subset of the full data set. We show that this reduces the time and costs associated to labelling while delivering the same or similar anomaly detection performances. Finally, we also show that machine learning models providing a nonlinear classification boundary are to be recommended for anomaly detection in complex environmental data sets.

[1]  Janelcy Alferes,et al.  Efficient automated quality assessment: Dealing with faulty on-line water quality sensors , 2016, AI Commun..

[2]  Robert Hooke,et al.  `` Direct Search'' Solution of Numerical and Statistical Problems , 1961, JACM.

[3]  J. Scott Long,et al.  Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes , 2005 .

[4]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[5]  Shahrzad Zargari,et al.  Feature Selection in the Corrected KDD-dataset , 2012, 2012 Third International Conference on Emerging Intelligent Data and Web Technologies.

[6]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[7]  Ashraf Osman Ibrahim,et al.  Artificial Neural Network Weight Optimization: A Review , 2014 .

[8]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[9]  David J. Hill,et al.  Anomaly detection in streaming environmental sensor data: A data-driven modeling approach , 2010, Environ. Model. Softw..

[10]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[11]  Joan-Andreu Sánchez,et al.  Active Learning in Handwritten Text Recognition using the Derivational Entropy , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[12]  Kris Villez,et al.  Characterizing long-term wear and tear of ion-selective pH sensors. , 2019, Water science and technology : a journal of the International Association on Water Pollution Research.

[13]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14]  Terrance E. Boult,et al.  Reducing Network Agnostophobia , 2018, NeurIPS.

[15]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[16]  Neelam Sharma,et al.  INTRUSION DETECTION USING NAIVE BAYES CLASSIFIER WITH FEATURE REDUCTION , 2012 .

[17]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[18]  D. Angluin Queries and Concept Learning , 1988 .

[19]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[20]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[21]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[22]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[23]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[24]  Andrea Castelletti,et al.  An active learning approach for identifying the smallest subset of informative scenarios for robust planning under deep uncertainty , 2020, Environ. Model. Softw..

[25]  Lam-for Kwok,et al.  Enhancing False Alarm Reduction Using Pool-Based Active Learning in Network Intrusion Detection , 2013, ISPEC.

[26]  L. Magder,et al.  Logistic regression when the outcome is measured with uncertainty. , 1997, American journal of epidemiology.

[27]  Kris Villez,et al.  Shape anomaly detection for process monitoring of a sequencing batch reactor , 2016, Comput. Chem. Eng..

[28]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[29]  Lovekesh Vig,et al.  Long Short Term Memory Networks for Anomaly Detection in Time Series , 2015, ESANN.

[30]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[31]  Kris Villez,et al.  Anomaly Detection using Deep Autoencoders for in-situ Wastewater Systems Monitoring Data , 2020, ArXiv.

[32]  Julio J. Valdés,et al.  Computational intelligence in earth sciences and environmental applications: Issues and challenges , 2006, Neural Networks.

[33]  Michèle Sebag,et al.  Collaborative hyperparameter tuning , 2013, ICML.

[34]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[35]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[36]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[37]  Richard W. Conners,et al.  A comparison of rule-based, k-nearest neighbor, and neural net classifiers for automated industrial inspection , 1991, [1991] Proceedings of the IEEE/ACM International Conference on Developing and Managing Expert System Programs.

[38]  Ingmar Nopens,et al.  pyIDEAS: an open source Python package for model analysis , 2015 .

[39]  Mahmood Fathy,et al.  Adversarially Learned One-Class Classifier for Novelty Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Kerrie Mengersen,et al.  A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. , 2018, The Science of the total environment.

[41]  Daniel Aguado,et al.  Multivariate statistical monitoring of continuous wastewater treatment plants , 2008, Eng. Appl. Artif. Intell..

[42]  Adriano Veloso,et al.  A Generalized Active Learning Approach for Unsupervised Anomaly Detection , 2018, ArXiv.

[43]  Nathalie Japkowicz,et al.  Active Learning for One-Class Classification , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[44]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[45]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[46]  Erland Jonsson,et al.  Using active learning in intrusion detection , 2004, Proceedings. 17th IEEE Computer Security Foundations Workshop, 2004..

[47]  Jun Wang,et al.  Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar , 2013 .

[48]  V. Rao Vemuri,et al.  Use of K-Nearest Neighbor classifier for intrusion detection , 2002, Comput. Secur..

[49]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[50]  Isabelle Guyon,et al.  Results of the Active Learning Challenge , 2011, Active Learning and Experimental Design @ AISTATS.

[51]  C. Maravelias,et al.  Modeling and forecasting pelagic fish production using univariate and multivariate ARIMA models , 2007, Fisheries Science.

[52]  Dorothee Spuhler,et al.  The Potential of Knowing More: A Review of Data-Driven Urban Water Management. , 2017, Environmental science & technology.

[53]  Mark W. Schmidt,et al.  Minimizing finite sums with the stochastic average gradient , 2013, Mathematical Programming.

[54]  José Carlos Pinto,et al.  Sequential experimental design for model discrimination: Taking into account the posterior covariance matrix of differences between model predictions , 2008 .

[55]  Percy Liang,et al.  On the Relationship between Data Efficiency and Error for Uncertainty Sampling , 2018, ICML.

[56]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[57]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[58]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[59]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[60]  David K. Stevens,et al.  A sensor network for high frequency estimation of water quality constituent fluxes using surrogates , 2010, Environ. Model. Softw..

[61]  Wenbin Cai,et al.  Batch Mode Active Learning for Regression With Expected Model Change , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[62]  P A Vanrolleghem,et al.  monEAU: a platform for water quality monitoring networks. , 2008, Water science and technology : a journal of the International Association on Water Pollution Research.

[63]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[64]  Anita Narwani,et al.  Interactive effects of foundation species on ecosystem functioning and stability in response to disturbance , 2019, Proceedings of the Royal Society B.

[65]  Nii O. Attoh-Okine,et al.  Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance , 1999 .

[66]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[67]  Bernard De Baets,et al.  Performance assessment of the anticipatory approach to optimal experimental design for model discrimination , 2012 .

[68]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[69]  Burr Settles,et al.  From Theories to Queries: Active Learning in Practice , 2011 .

[70]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[71]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[72]  Jun Du,et al.  Active Learning with Human-Like Noisy Oracle , 2010, 2010 IEEE International Conference on Data Mining.

[73]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[74]  Tianshun Yao,et al.  Active Learning with Sampling by Uncertainty and Density for Word Sense Disambiguation and Text Classification , 2008, COLING.

[75]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[76]  Slim Abdennadher,et al.  Enhancing one-class support vector machines for unsupervised anomaly detection , 2013, ODD '13.