Scalable Architecture for Anomaly Detection and Visualization in Power Generating Assets

Power-generating assets (e.g., jet engines, gas turbines) are often instrumented with tens to hundreds of sensors for monitoring physical and performance degradation. Anomaly detection algorithms highlight deviations from predetermined benchmarks with the goal of detecting incipient faults. We are developing an integrated system to address three key challenges within analyzing sensor data from power-generating assets: (1) difficulty in ingesting and analyzing data from large numbers of machines; (2) prevalence of false alarms generated by anomaly detection algorithms resulting in unnecessary downtime and maintenance; and (3) lack of an integrated visualization that helps users understand and explore the flagged anomalies and relevant sensor context in the energy domain. We present preliminary results and our key findings in addressing these challenges. Our system's scalable event ingestion framework, based on OpenTSDB, ingests nearly 400,000 sensor data samples per seconds using a 30 machine cluster. To reduce false alarm rates, we leverage the False Discovery Rate (FDR) algorithm which significantly reduces the number of false alarms. Our visualization tool presents the anomalies and associated content flagged by the FDR algorithm to inform users and practitioners in their decision making process. We believe our integrated platform will help reduce maintenance costs significantly while increasing asset lifespan. We are working to extend our system on multiple fronts, such as scaling to more data and more compute nodes (70 in total).

[1]  Phil Ratcliff,et al.  The New Siemens Gas Turbine SGT5-8000H for More Customer Benefit , 2007 .

[2]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[3]  Peter Söderholm A system view of the No Fault Found (NFF) phenomenon , 2007 .

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[6]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[7]  Richard D. Braatz,et al.  Fault Detection and Diagnosis in Industrial Systems , 2001 .

[8]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[9]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[10]  Duen Horng Chau,et al.  Building a research data science platform from industrial machines , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[11]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Peter Söderholm A system view of the No Fault Found (NFF) phenomenon , 2007, Reliab. Eng. Syst. Saf..

[14]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[15]  Rolf Isermann,et al.  Fault-Diagnosis Applications: Model-Based Condition Monitoring: Actuators, Drives, Machinery, Plants, Sensors, and Fault-tolerant Systems , 2011 .