Using feature importance metrics to detect events of interest in scientific computing applications

With current high performance scientific computing workflows, data are typically recorded at regular intervals spaced several hundred time steps apart. Data are not saved at every time step to prevent excessive memory usage and because data I/O is often a bottleneck in the workflow. However, in many dynamical systems, events of interest occur locally in space and time. In these cases, a global data save across all processors at regular intervals is both inefficient and ineffective: it will result in data being saved over regions where nothing of interest is occurring, and it will miss an event of interest that occurs at time steps between data saves. What is needed is a method of automatically detecting an event of interest as it occurs so that a data save can be triggered on the relevant processors. We propose a method of detecting such events of interest using feature importance metrics. This method requires very little communication between processors, thereby lending itself to implementation in a high performance computing setting.

[1]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[2]  Prabhat,et al.  The effect of horizontal resolution on simulation quality in the Community Atmospheric Model, CAM5.1 , 2014 .

[3]  W. Collins,et al.  Description of the NCAR Community Atmosphere Model (CAM 3.0) , 2004 .

[4]  Kai Ming Ting,et al.  Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Ali Pinar,et al.  Trigger Detection for Adaptive Scientific Workflows Using Percentile Sampling , 2015, SIAM J. Sci. Comput..

[7]  Steve Kelling,et al.  Mining citizen science data to predict orevalence of wild bird species , 2006, KDD '06.

[8]  Philip S. Yu,et al.  RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection , 2014, 2014 IEEE International Conference on Data Mining.

[9]  Tarem Ahmed,et al.  Online Anomaly Detection Using KDE , 2009, GLOBECOM 2009 - 2009 IEEE Global Telecommunications Conference.

[10]  Kevin A. Reed,et al.  An Analytic Vortex Initialization Technique for Idealized Tropical Cyclone Studies in AGCMs , 2011 .

[11]  Parikshit Ram,et al.  Density estimation trees , 2011, KDD.

[12]  Paul A. Ullrich,et al.  TempestExtremes: a framework for scale-insensitive pointwise feature tracking on unstructured grids , 2016 .

[13]  Shian-Jiann Lin,et al.  Simulations of global hurricane climatology, interannual variability, and response to global warming using a 50-km resolution GCM. , 2009 .

[14]  A. Materna,et al.  FEM simulation of fatigue crack growth , 1997 .

[15]  Gaurav Bansal,et al.  Direct numerical simulations of autoignition in stratified dimethyl-ether (DME)/air turbulent mixtures , 2015 .

[16]  Rajeev Thakur,et al.  Decoupled I/O for Data-Intensive High Performance Computing , 2014, 2014 43rd International Conference on Parallel Processing Workshops.

[17]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .

[18]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .