Computing Just What You Need: Online Data Analysis and Reduction at Extreme Scales

A growing disparity between supercomputer computation speeds and I/O rates makes it increasingly infeasible for applications to save all results for offline analysis. Instead, applications must analyze and reduce data online so as to output only those results needed to answer target scientific question(s). This change in focus complicates application and experiment design and introduces algorithmic, implementation, and programming model challenges that are unfamiliar to many scientists and that have major implications for the design of various elements of supercomputer systems. We review these challenges and describe methods and tools that we are developing to enable experimental exploration of algorithmic, software, and system design alternatives.

[1]  Franck Cappello,et al.  Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[2]  Kesheng Wu,et al.  Towards Real-Time Detection and Tracking of Blob-Filaments in Fusion Plasma Big Data , 2015, ArXiv.

[3]  Peter Lindstrom,et al.  Fixed-Rate Compressed Floating-Point Arrays , 2014, IEEE Transactions on Visualization and Computer Graphics.

[4]  Hao Huang,et al.  A New Anomaly Detection Algorithm Based on Quantum Mechanics , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Robert Latham,et al.  ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[6]  Ian T. Foster,et al.  Virtual Data Language: A Typed Workflow Notation for Diversely Structured Scientific Data , 2007, Workflows for e-Science, Scientific Workflows for Grids.

[7]  Sangmin Seo,et al.  Extreme-Scale Stochastic Particle Tracing for Uncertain Unsteady Flow Analysis , 2016 .

[8]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[9]  Justin M. Wozniak,et al.  Lessons Learned from Building In Situ Coupling Frameworks , 2015, ISAV@SC.

[10]  Chao Yang,et al.  SAIDE: Scaling Analytics for Image-based Data from Experiments , 2016 .

[11]  Robert Latham,et al.  ISABELA for effective in situ compression of scientific data , 2013, Concurr. Comput. Pract. Exp..

[12]  Scott Klasky,et al.  A Maya use case: adaptable scientific workflows with ADIOS for general relativistic astrophysics , 2013, XSEDE.

[13]  Franck Cappello,et al.  Fast Error-Bounded Lossy HPC Data Compression with SZ , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[14]  James P. Ahrens Increasing Scientific Data Insights about Exascale Class Simulations under Power and Storage Constraints , 2015, IEEE Computer Graphics and Applications.

[15]  Hanqi Guo,et al.  Finite-Time Lyapunov Exponents and Lagrangian Coherent Structures in Uncertain Unsteady Flows , 2016, IEEE Transactions on Visualization and Computer Graphics.

[16]  Xiaotong Liu,et al.  Association Analysis for Visual Exploration of Multivariate Scientific Data Sets , 2016, IEEE Transactions on Visualization and Computer Graphics.

[17]  Sven Leyffer,et al.  Optimal scheduling of in-situ analysis for large-scale scientific simulations , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Daniel S. Katz,et al.  Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing , 2015, 2015 IEEE International Conference on Cluster Computing.

[19]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[20]  Herbert Edelsbrunner,et al.  Persistent Homology: Theory and Practice , 2013 .

[21]  Dmitriy Morozov,et al.  Geometry Helps to Compare Persistence Diagrams , 2016, ALENEX.

[22]  Hao Huang,et al.  Physics-Based Anomaly Detection Defined on Manifold Space , 2014, TKDD.

[23]  Priya Vashishta,et al.  Nanobubble collapse on a silica surface in water: billion-atom reactive molecular dynamics simulations. , 2013, Physical review letters.

[24]  Robert B. Ross,et al.  A Study of Parallel Particle Tracing for Steady-State and Time-Varying Flow Fields , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[25]  Talita Perciano,et al.  Reduced-complexity image segmentation under parallel Markov Random Field formulation using graph partitioning , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[26]  Han-Wei Shen,et al.  An Information-Aware Framework for Exploring Multivariate Data Sets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[27]  P. Woodward,et al.  The Piecewise Parabolic Method (PPM) for Gas Dynamical Simulations , 1984 .

[28]  Arie Shoshani,et al.  Hello ADIOS: the challenges and lessons of developing leadership class I/O frameworks , 2014, Concurr. Comput. Pract. Exp..

[29]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[30]  Bruno Raffin,et al.  A Flexible Framework for Asynchronous in Situ and in Transit Analytics for Scientific Simulations , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[31]  Johannes Albrecht Challenges for the LHC Run 3: Computing and Algorithms , 2016 .

[32]  Jeremy Iverson,et al.  Fast and Effective Lossy Compression Algorithms for Scientific Datasets , 2012, Euro-Par.

[33]  Hal Finkel,et al.  Meshing the Universe: Integrating Analysis in Cosmological Simulations , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[34]  Rafael C. Bernardi,et al.  Molecular dynamics simulations of large macromolecular complexes. , 2015, Current opinion in structural biology.

[35]  Alok N. Choudhary,et al.  Double Standards: Bringing Task Parallelism to HPF Via the Message Passing Interface , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[36]  Hao Huang,et al.  Local anomaly descriptor: a robust unsupervised algorithm for anomaly detection based on diffusion space , 2012, CIKM.

[37]  Mark F. Adams,et al.  Gyrokinetic particle simulation of neoclassical transport in the pedestal/scrape-off region of a tokamak plasma , 2006 .

[38]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[39]  Scott Klasky,et al.  Performance Impact of I/O on QMCPack Simulations at the Petascale and Beyond , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[40]  David Cohen-Steiner,et al.  Stability of Persistence Diagrams , 2005, Discret. Comput. Geom..

[41]  Robert Latham,et al.  ISABELA-QA: Query-driven analytics with ISABELA-compressed extreme-scale scientific data , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).