Machine Learning for the New York City Power Grid

Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce (1) feeder failure rankings, (2) cable, joint, terminator, and transformer rankings, (3) feeder Mean Time Between Failure (MTBF) estimates, and (4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City's electrical grid.

[1]  Nikos D. Hatziargyriou,et al.  Machine Learning Applications to Power Systems , 2001, Machine Learning and Its Applications.

[2]  D. Steinberg CART: Classification and Regression Trees , 2009 .

[3]  Philip M. Long,et al.  Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis , 2006, AAAI.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  S. Massoud Amin U.S. grid gets less reliable [The Data] , 2011 .

[6]  D.,et al.  Regression Models and Life-Tables , 2022 .

[7]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[8]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[9]  Nikos D. Hatziargyriou,et al.  Improved Wind Power Forecasting Using a Combined Neuro-fuzzy and Artificial Neural Network Model , 2006, SETN.

[10]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[11]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[12]  Hila Becker,et al.  Real-time ranking with concept drift using expert advice , 2007, KDD '07.

[13]  Tok Wang Ling,et al.  Exploration mining in diabetic patients databases: findings and conclusions , 2000, KDD '00.

[14]  Manuel Filipe Santos,et al.  KDD, SEMMA and CRISP-DM: a parallel overview , 2008, IADIS European Conf. Data Mining.

[15]  Wei Chu,et al.  A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[16]  ZhengZijian,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004 .

[17]  Cynthia Rudin,et al.  Margin-based Ranking and an Equivalence between AdaBoost and RankBoost , 2009, J. Mach. Learn. Res..

[18]  Pierre Geurts,et al.  About automatic learning for advanced sensing, monitoring and control of electric power systems , 2006 .

[19]  Haimonti Dutta,et al.  Ranking Electrical Feeders of the New York Power Grid , 2009, 2009 International Conference on Machine Learning and Applications.

[20]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[21]  Anastasios G. Bakirtzis,et al.  Genetic algorithm solution to the economic dispatch problem , 1994 .

[22]  Axinia Radeva,et al.  Reducing Noise in Labels and Features for a Real World Dataset: Application of NLP Corpus Annotation Methods , 2009, CICLing.

[23]  Louis Wehenkel,et al.  Automatic Learning Techniques in Power Systems , 1997 .

[24]  Haimonti Dutta,et al.  A process for predicting manhole events in Manhattan , 2009, Machine Learning.

[25]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Andrew Kusiak,et al.  Data Mining in Manufacturing: A Review , 2006 .

[28]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[29]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[30]  S. R. Dalal,et al.  Ch. 12. The promise and challenge of mining web transaction data , 2003 .

[31]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[32]  Rajesh Parekh,et al.  Lessons and Challenges from Mining Retail E-Commerce Data , 2004, Machine Learning.

[33]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[34]  B. Cornelusse,et al.  Automatic learning for the classification of primary frequency control behaviour , 2007, 2007 IEEE Lausanne Power Tech.

[35]  Abhisek Ukil,et al.  Intelligent systems and signal processing in power engineering , 2007 .

[36]  Usama M. Fayyad,et al.  Knowledge Discovery in Databases: An Overview , 1997, ILP.

[37]  Louis Wehenkel,et al.  Early prediction of electric power system blackouts by temporal machine learning , 1998 .

[38]  Haimonti Dutta,et al.  Visualization of Manhole and Precursor-Type Events for the Manhattan Electrical Distribution System , 2008 .