Machine Learning Methods for Attack Detection in the Smart Grid

Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Adam Tauman Kalai,et al.  From Batch to Transductive Online Learning , 2005, NIPS.

[3]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[4]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[5]  Bernhard Schölkopf,et al.  A Direct Method for Building Sparse Kernel Learning Algorithms , 2006, J. Mach. Learn. Res..

[6]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[7]  H. Vincent Poor,et al.  Distributed models for sparse attack construction and state vector estimation in the smart grid , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[8]  Sanjeev R. Kulkarni,et al.  An Elementary Introduction to Statistical Learning Theory: Kulkarni/Statistical Learning Theory , 2011 .

[9]  Sanjeev R. Kulkarni,et al.  Statistical learning theory: a tutorial , 2011 .

[10]  Haimonti Dutta,et al.  Machine Learning for the New York City Power Grid , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Peng Ning,et al.  False data injection attacks against state estimation in electric power grids , 2009, CCS.

[12]  Chih-Jen Lin,et al.  Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel , 2003, Neural Computation.

[13]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[14]  H. Vincent Poor,et al.  Smarter security in the smart grid , 2012, 2012 IEEE Third International Conference on Smart Grid Communications (SmartGridComm).

[15]  H. Vincent Poor,et al.  Sparse Attack Construction and State Estimation in the Smart Grid: Centralized and Distributed Models , 2013, IEEE Journal on Selected Areas in Communications.

[16]  A. G. Expósito,et al.  Power system state estimation : theory and implementation , 2004 .

[17]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[18]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[19]  Michèle Sebag,et al.  Phase Transitions in Machine Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[20]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[21]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[22]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[23]  Seth Blumsack,et al.  Comparing the Topological and Electrical Structure of the North American Electric Power Infrastructure , 2011, IEEE Systems Journal.

[24]  Barbara Caputo,et al.  Bounded Kernel-Based Online Learning , 2009, J. Mach. Learn. Res..

[25]  Lang Tong,et al.  Malicious Data Attacks on the Smart Grid , 2011, IEEE Transactions on Smart Grid.

[26]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[27]  H. Vincent Poor,et al.  Strategic Protection Against Data Injection Attacks on Power Grids , 2011, IEEE Transactions on Smart Grid.

[28]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[29]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[30]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[31]  R D Zimmerman,et al.  MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education , 2011, IEEE Transactions on Power Systems.

[32]  Robert C. Green,et al.  Intrusion Detection System in A Multi-Layer Network Architecture of Smart Grids by Yichi , 2015 .

[33]  Warren B. Powell,et al.  Adaptive Stochastic Control for the Smart Grid , 2011, Proceedings of the IEEE.

[34]  Sanjeev R. Kulkarni,et al.  An Elementary Introduction to Statistical Learning Theory , 2011 .

[35]  Nei Kato,et al.  An early warning system against malicious activities for smart grid communications , 2011, IEEE Network.

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[38]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[39]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[40]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[41]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[42]  Mark W. Schmidt,et al.  An interior-point stochastic approximation method and an L1-regularized delta rule , 2008, NIPS.

[43]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[44]  David G. Stork,et al.  Pattern Classification , 1973 .

[45]  Jinbo Bi,et al.  Dimensionality Reduction via Sparse Support Vector Machines , 2003, J. Mach. Learn. Res..

[46]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[47]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[48]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[50]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.