Big Data New Frontiers: Mining, Search and Management of Massive Repositories of Solar Image Data and Solar Events

This work presents one of the many emerging research domains where big data analysis has become an immediate need to process the massive amounts of data being generated each day: solar physics. While building a content-based image retrieval system for NASA’s Solar Dynamics Observatory mission, we have discovered research problems that can be addressed by the use of big data processing techniques and in some cases require the development of novel techniques. With over one terabyte of solar data being generated each day, and ever more missions on the horizon that expect to generate petabytes of data each year, solar physics presents many exciting opportunities. This paper presents the current status of our work with solar image data and events, our shift towards using big data methodologies, and future directions for big data processing in solar physics.

[1]  Rafal A. Angryk,et al.  On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images , 2009, 2009 IEEE International Conference on Fuzzy Systems.

[2]  N. Raouafi,et al.  Computer Vision for the Solar Dynamics Observatory (SDO) , 2012 .

[3]  Karthik Ganesan Pillai,et al.  Extending High-Dimensional Indexing Techniques Pyramid and iMinMax(θ): Lessons Learned , 2013, BNCOD.

[4]  Rafal A. Angryk,et al.  On the surprisingly accurate transfer of image parameters between medical and solar images , 2011, 2011 18th IEEE International Conference on Image Processing.

[5]  Mike Hapgood,et al.  Towards a scientific understanding of the risk from extreme space weather , 2011 .

[6]  Yang Gao,et al.  A Content-Based Image Retrieval System Based on Hadoop and Lucene , 2012, 2012 Second International Conference on Cloud and Green Computing.

[7]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[8]  Rafal A. Angryk,et al.  On Dimensionality Reduction for Indexing and Retrieval of Large-Scale Solar Image Data , 2013 .

[9]  Rafal A. Angryk,et al.  Selection of Image Parameters as the First Step towards Creating a CBIR System for the Solar Dynamics Observatory , 2010, 2010 International Conference on Digital Image Computing: Techniques and Applications.

[10]  Karthik Ganesan Pillai,et al.  A large-scale solar image dataset with labeled event regions , 2013, 2013 IEEE International Conference on Image Processing.

[11]  Rafal A. Angryk,et al.  A Comparative Evaluation of Automated Solar Filament Detection , 2012 .

[12]  Rafal A. Angryk,et al.  An Experimental Evaluation of Popular Image Parameters for Monochromatic Solar Image Categorization , 2010, FLAIRS.

[13]  Rafal A. Angryk,et al.  A Comprehensive Study of iDistance Partitioning Strategies for kNN Queries and High-Dimensional Data Indexing , 2013, BNCOD.

[14]  Rafal A. Angryk,et al.  Usage of Dissimilarity Measures and Multidimensional Scaling for Large Scale Solar Data Analysis , 2010, CIDU.

[15]  Hak-Keung Lam,et al.  Delay-dependent stabilization condition for T-S fuzzy neutral systems , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[16]  Karthik Ganesan Pillai,et al.  Spatio-temporal Co-occurrence Pattern Mining in Data Sets with Evolving Regions , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[17]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[18]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[19]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.