Quality assessment of clusters of electrical disturbances: A case study

Electrical disturbances can have an adverse affect on people, businesses and other systems, and increased understanding of such events has huge potential benefits. Clustering or unsupervised learning is a technique of computational intelligence that can be used to identify natural clusters or groups of disturbances. The understanding of disturbances is important for planning and maintenance, and may lead to fresh insights which prove useful in upgrade of infrastructure. Ideally this should be a fully automated process, so that no bias is introduced by the practitioner. The complete process involves several steps including data cleaning, transformation, feature selection, clustering, evaluation of clusters, cluster description and cluster interpretation. We designate this process as the Clustering Knowledge Chain, beginning with raw data and ending with new knowledge. This case study examines each of these steps, showing how they might be applied in a situation involving real-world data, and illustrates some of the difficulties that a practitioner with domain knowledge may encounter. Results from this case study reveal new knowledge about electrical disturbances, but also show that selection of parameters using a quantitative measure of clustering quality is not enough, by itself, to guarantee clusters that can inform the practitioner.

[1]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Kanti V. Mardia,et al.  Statistics of Directional Data , 1972 .

[3]  S.N. Siddiqi Resource Adequacy in the "Energy-Only" ERCOT Market , 2007, 2007 IEEE Power Engineering Society General Meeting.

[4]  R. Mooney,et al.  Impact of Similarity Measures on Web-page Clustering , 2000 .

[5]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[6]  Rudolf Kruse,et al.  Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture , 2011, ICDM.

[7]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[8]  Woncheol Jang,et al.  Cluster analysis of massive datasets in astronomy , 2007, Stat. Comput..

[9]  M. Cugmas,et al.  On comparing partitions , 2015 .

[10]  J. Caers,et al.  Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling , 2010 .

[11]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[13]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[14]  Pedro Larrañaga,et al.  Feature Subset Selection by Bayesian network-based optimization , 2000, Artif. Intell..

[15]  K. Mardia Statistics of Directional Data , 1972 .

[16]  Henri Luchian,et al.  A unifying criterion for unsupervised clustering and feature selection , 2011, Pattern Recognit..

[17]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[19]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[20]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[21]  S S Stevens,et al.  On the Theory of Scales of Measurement. , 1946, Science.

[22]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[23]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[24]  Rae Zimmerman,et al.  Risk-management and risk-analysis-based decision tools for attacks on electric power. , 2007, Risk analysis : an official publication of the Society for Risk Analysis.