On the path to sustainable, scalable, and energy-efficient data analytics: Challenges, promises, and future directions

As scientific data is reaching exascale, scalable and energy efficient data analytics is quickly becoming a top notch priority. Yet, a sustainable solution to this problem is hampered by a number of technical challenges that get exacerbated with the emerging hardware and software technology trends. In this paper, we present a number of recently created “secret sauces” that promise to address some of these challenges. We discuss transformative approaches to efficient data reduction, analytics-driven query processing, scalable analytical kernels, approximate analytics, among others. We propose a number of future directions that could be pursued on the path to sustainable data analytics at scale.

[1]  Jignesh M. Patel,et al.  Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race , 2011, IEEE Data Eng. Bull..

[2]  Nagiza F. Samatova,et al.  RScaLAPACK: High-Performance Parallel Statistical Computing with R and ScaLAPACK , 2005, ISCA PDCS.

[3]  Surajit Chaudhuri,et al.  Rethinking Query Processing for Energy Efficiency: Slowing Down to Win the Race. , 2011 .

[4]  Karsten Schwan,et al.  Just in time: adding value to the IO pipelines of high performance applications with JITStaging , 2011, HPDC '11.

[5]  Robert Latham,et al.  ISOBAR Preconditioner for Effective and High-throughput Lossless Data Compression , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[6]  Robert B. Ross,et al.  ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization , 2012, HPDC '12.

[7]  Xiaorui Wang,et al.  Exploring power-performance tradeoffs in database systems , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[8]  Karsten Schwan,et al.  DataStager: scalable data staging services for petascale applications , 2009, HPDC '09.

[9]  Berkin Özisikyilmaz,et al.  Accelerating data mining workloads: current approaches and future challenges in system architecture design , 2011, WIREs Data Mining Knowl. Discov..

[10]  Robert Latham,et al.  Compressing the Incompressible with ISABELA: In-situ Reduction of Spatio-temporal Data , 2011, Euro-Par.

[11]  Berkin Özisikyilmaz,et al.  MineBench: A Benchmark Suite for Data Mining Workloads , 2006, 2006 IEEE International Symposium on Workload Characterization.

[12]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[13]  Arie Shoshani,et al.  On the performance of bitmap indices for high cardinality attributes , 2004, VLDB.

[14]  Berkin Özisikyilmaz,et al.  Quantization Error and Accuracy-Performance Tradeoffs for Embedded Data Mining Workloads , 2007, International Conference on Computational Science.