A Real-Time Machine Learning and Visualization Framework for Scientific Workflows

High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows.

[1]  Edward N. Tinoco,et al.  Thirty Years of Development and Application of CFD at Boeing Commercial Airplanes, Seattle , 2003 .

[2]  Jacqueline H. Chen,et al.  Direct numerical simulation of turbulent combustion: fundamental insights towards predictive models , 2005 .

[3]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4]  Nancy Wilkins-Diehr,et al.  XSEDE: Accelerating Scientific Discovery , 2014, Computing in Science & Engineering.

[5]  Barnabás Póczos,et al.  Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions , 2011, UAI.

[6]  Barnabás Póczos,et al.  Support Distribution Machines , 2012, ArXiv.

[7]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[8]  MING JIANG,et al.  Detection and Visualization of Vortices , 2005, The Visualization Handbook.

[9]  Kenneth Moreland,et al.  Sandia National Laboratories , 2000 .

[10]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[11]  Forrester T. Johnson,et al.  THIRTY YEARS OF DEVELOPMENT AND APPLICATION OF CFD AT BOEING COMMERCIAL AIRPLANES, SEATTLE , 2005 .

[12]  B. Fried,et al.  Detection and Visualization , 1999 .

[13]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[14]  Scott Klasky,et al.  DataSpaces: an interaction and coordination framework for coupled simulation workflows , 2012, HPDC '10.

[15]  David Ellsworth,et al.  Concurrent Visualization in a Production Supercomputing Environment , 2006, IEEE Transactions on Visualization and Computer Graphics.

[16]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[17]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[18]  Scott Klasky,et al.  Terascale direct numerical simulations of turbulent combustion using S3D , 2008 .