AEGLE's Cloud Infrastructure for Resource Monitoring and Containerized Accelerated Analytics

This paper presents the cloud infrastructure of the AEGLE project, that targets to integrate cloud technologies together with heterogeneous reconfigurable computing in large scale healthcare systems for Big Bio-Data analytics. AEGLEs engineering concept brings together the hot big-data engines with emerging acceleration technologies, putting the basis for personalized and integrated health-care services, while also promoting related research activities. We introduce the design of AEGLE’s accelerated infrastructure along with the corresponding software and hardware acceleration stacks to support various big data analytics workloads showing that through effective resource containerization AEGLE’s cloud infrastructure is able to support high heterogeneity regarding to storage types, execution engines, utilized tools and execution platforms. Special care is given to the integration of high performance accelerators within the overall software stack of AEGLE’s infrastructure, which enable efficient execution of analytics, up to 140× according to our preliminary evaluations, over pure software executions.

[1]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[2]  Yu Zhang,et al.  Enabling FPGAs in the cloud , 2014, Conf. Computing Frontiers.

[3]  Kai Wang,et al.  SeqMule: automated pipeline for analysis of human exome/genome sequencing data , 2015, Scientific Reports.

[4]  Kizheppatt Vipin,et al.  Virtualized FPGA Accelerators for Efficient Cloud Computing , 2015, 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom).

[5]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[6]  Jason Cong,et al.  When apache spark meets FPGAs: a case study for next-generation DNA sequencing acceleration , 2016, CloudCom 2016.

[7]  Eric A. Brewer,et al.  Borg, Omega, and Kubernetes , 2016, ACM Queue.

[8]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[9]  Jason Cong,et al.  Invited: Heterogeneous datacenters: Options and opportunities , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[10]  Alex Davies,et al.  Scale out with GlusterFS , 2013 .

[11]  Zervakis Georgios,et al.  Performance-power exploration of software-defined big data analytics: The AEGLE cloud backend , 2016 .

[12]  Dimitrios Soudris,et al.  Performance-power exploration of software-defined big data analytics: The AEGLE cloud backend , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).

[13]  Dimitrios Soudris,et al.  Dataflow Acceleration of scikit-learn Gaussian Process Regression , 2017, PARMA-DITAM '17.

[14]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[15]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.