In-Situ Bitmaps Generation and Efficient Data Analysis based on Bitmaps

Neither the memory capacity, memory access speeds, nor disk bandwidths are increasing at the same rate as the computing power in current and upcoming parallel machines. This has led to considerable recent research on in-situ data analytics. However, many open questions remain on how to perform such analytics, especially in memory constrained systems. Building on our earlier work that demonstrated bitmap indices (bitmaps) can be a suitable summary structure for key (offline) analytics tasks, this paper develops an in-situ analysis approach that performs data reduction (such as time-steps selection) using just bitmaps, and subsequently, stores only the selected bitmaps for post-analysis. We construct compressed bitmaps on the fly, show that many kinds of in-situ analyses can be supported by bitmaps without requiring the original data (and thus reducing memory requirements for in-situ analysis), and instead of writing the original simulation output, we only write the selected bitmaps to the disks (reducing the I/O requirements). We also demonstrate that we are able to use bitmaps for key offline analysis steps. We extensively evaluate our method with different simulations and applications, and demonstrate the effectiveness of our approach.

[1]  Arie Shoshani,et al.  Using bitmap index for interactive exploration of large datasets , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[2]  Han-Wei Shen,et al.  Visualization and Exploration of Temporal Trend Relationships in Multivariate Time-Varying Data , 2009, IEEE Transactions on Visualization and Computer Graphics.

[3]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[4]  Patrick H. Worley,et al.  Practical performance portability in the Parallel Ocean Program (POP): Research Articles , 2005 .

[5]  Arie Shoshani,et al.  In situ data processing for extreme-scale computing , 2011 .

[6]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[7]  Michael E. Papka,et al.  Toward simulation-time data analysis and I/O acceleration on leadership-class systems , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[8]  Xiaolei Wang,et al.  Non-negative matrix factorization by maximizing correntropy for cancer clustering , 2013, BMC Bioinformatics.

[9]  Mateu Sbert,et al.  Automatic View Selection Using Viewpoint Entropy and its Application to Image‐Based Modelling , 2003, Comput. Graph. Forum.

[10]  Peter M. Kogge,et al.  Using the TOP500 to trace and project technology and architecture trends , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Matthew O. Ward,et al.  Analysis Guided Visual Exploration of Multivariate Data , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[12]  Arie Shoshani,et al.  Compressing bitmap indexes for faster search operations , 2002, Proceedings 14th International Conference on Scientific and Statistical Database Management.

[13]  John Shalf,et al.  Exascale Computing Trends: Adjusting to the "New Normal"' for Computer Architecture , 2013, Computing in Science & Engineering.

[14]  Xin Tong,et al.  Salient time steps selection from large scale time-varying data sets with dynamic time warping , 2012, IEEE Symposium on Large Data Analysis and Visualization (LDAV).

[15]  Kwan-Liu Ma,et al.  Importance-Driven Time-Varying Data Visualization , 2008, IEEE Transactions on Visualization and Computer Graphics.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Yi Wang,et al.  A novel approach for approximate aggregations over arrays , 2015, SSDBM.

[18]  Arie Shoshani,et al.  Parallel in situ indexing for data-intensive computing , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[19]  John M. Levesque,et al.  Practical performance portability in the Parallel Ocean Program (POP) , 2005, Concurr. Pract. Exp..

[20]  V. Pascucci,et al.  Global Static Indexing for Real-Time Exploration of Very Large Regular Grids , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[21]  Gagan Agrawal,et al.  Indexing and Parallel Query Processing Support for Visualizing Climate Datasets , 2012, 2012 41st International Conference on Parallel Processing.

[22]  Ray W. Grout,et al.  Analyzing information transfer in time-varying multivariate data , 2011, 2011 IEEE Pacific Visualization Symposium.

[23]  James P. Ahrens,et al.  An Image-Based Approach to Extreme Scale in Situ Visualization and Analysis , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[24]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[25]  Daniel A. Reed,et al.  Markov model prediction of I/O requests for scientific applications , 2002, ICS '02.

[26]  Si Liu,et al.  A User-Friendly Approach for Tuning Parallel File Operations , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[27]  Han-Wei Shen,et al.  An Information-Aware Framework for Exploring Multivariate Data Sets , 2013, IEEE Transactions on Visualization and Computer Graphics.

[28]  Karsten Schwan,et al.  GoldRush: Resource efficient in situ scientific data analytics using fine-grained interference aware execution , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[29]  Robert B. Ross,et al.  Scalable parallel building blocks for custom data analysis , 2011, 2011 IEEE Symposium on Large Data Analysis and Visualization.

[30]  G. Antoshenkov,et al.  Byte-aligned bitmap compression , 1995, Proceedings DCC '95 Data Compression Conference.

[31]  Arie Shoshani,et al.  Parallel index and query for large scale data analysis , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[32]  Pak Chung Wong,et al.  30 Years of Multidimensional Multivariate Visualization , 1994, Scientific Visualization.

[33]  Ian Karlin,et al.  LULESH 2.0 Updates and Changes , 2013 .

[34]  Mateu Sbert,et al.  Importance-Driven Focus of Attention , 2006, IEEE Transactions on Visualization and Computer Graphics.

[35]  Valerio Pascucci,et al.  In-Situ Feature Extraction of Large Scale Combustion Simulations Using Segmented Merge Trees , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[36]  Gerik Scheuermann,et al.  Brushing of Attribute Clouds for the Visualization of Multivariate Data , 2008, IEEE Transactions on Visualization and Computer Graphics.

[37]  Xiaocheng Zou,et al.  Scalable in situ scientific data encoding for analytical query processing , 2013, HPDC.

[38]  Stefan Gumhold,et al.  Maximum entropy light source placement , 2002, IEEE Visualization, 2002. VIS 2002..

[39]  Arie Shoshani,et al.  Breaking the Curse of Cardinality on Bitmap Indexes , 2008, SSDBM.

[40]  Ross J. Roeser,et al.  Updates and changes , 2012 .

[41]  Gagan Agrawal,et al.  Supporting correlation analysis on scientific datasets in parallel and distributed settings , 2014, HPDC '14.

[42]  Han-Wei Shen,et al.  View selection for volume rendering , 2005, VIS 05. IEEE Visualization, 2005..

[43]  Carlos Maltzahn,et al.  I/O acceleration with pattern detection , 2013, HPDC.

[44]  Robert Sisneros,et al.  Damaris/Viz: A nonintrusive, adaptable and user-friendly in situ visualization framework , 2013, 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV).