Development of Data-Intensive Services with Everest

The paper considers development of domain-specific web services for processing of large volumes of data on high-performance computing resources. The development of these services is associated with a number of challenges, such as integration with external data repositories, implementation of efficient data transfer, management of user data stored on the resource, execution of data processing jobs and provision of remote access to the data. An approach for building big data processing services on the base of Everest platform is presented. The proposed approach takes into account the characteristic features and supports rapid deployment of these services on the base of existing computing infrastructure. An example of service for short-read sequence alignment that processes the next-generation sequencing data on a Hadoop cluster is described.

[1]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[2]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[3]  Tomás F. Pena,et al.  BigBWA: approaching the Burrows-Wheeler aligner to Big Data technologies , 2015, Bioinform..

[4]  Alex Rodriguez,et al.  Experiences building Globus Genomics: a next‐generation sequencing analysis service using Galaxy, Globus, and Amazon Web Services , 2014, Concurr. Comput. Pract. Exp..

[5]  Oleg Sukhoroslov,et al.  A Web-Based Platform for Publication and Distributed Execution of Computing Applications , 2015, 2015 14th International Symposium on Parallel and Distributed Computing.

[6]  Alex Rodriguez,et al.  PDACS - A Portal for Data Analysis Services for Cosmological Simulations , 2014, 2014 9th Gateway Computing Environments Workshop.

[7]  Ian T. Foster,et al.  Efficient and Secure Transfer, Synchronization, and Sharing of Big Data , 2014, IEEE Cloud Computing.

[8]  Zhao Zhang,et al.  Scientific computing meets big data technology: An astronomy use case , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[9]  Zhao Zhang,et al.  Rethinking Data-Intensive Science Using Scalable Analytics Systems , 2015, SIGMOD Conference.