A network approach for managing and processing big cancer data in clouds

Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data.

[1]  Calton Pu,et al.  Intelligent management of virtualized resources for database systems in cloud environment , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[2]  R. Weinberg Coming Full Circle—From Endless Complexity to Simplicity and Back Again , 2014, Cell.

[3]  Albert Y. Zomaya,et al.  Energy-aware parallel task scheduling in a cluster , 2013, Future Gener. Comput. Syst..

[4]  S. Gabriel,et al.  Discovery and saturation analysis of cancer genes across 21 tumor types , 2014, Nature.

[5]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[6]  Ioannis Konstantinou,et al.  CELAR: Automated application elasticity platform , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[7]  Xi He,et al.  Cloud Computing: a Perspective Study , 2010, New Generation Computing.

[8]  Gregor von Laszewski,et al.  Towards building a cloud for scientific applications , 2011, Adv. Eng. Softw..

[9]  Rolf Apweiler,et al.  The EBI SRS Server: Recent Developments , 2002, German Conference on Bioinformatics.

[10]  Xi He,et al.  Towards Thermal Aware Workload Scheduling in a Data Center , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[11]  Jukka Riekki,et al.  Low latency analytics for streaming traffic data with Apache Spark , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[12]  Gert Vriend,et al.  MRS: a fast and compact retrieval system for biological data , 2005, Nucleic Acids Res..

[13]  Carole A. Goble,et al.  An ActOn-based semantic information service for Grids , 2010, Future Gener. Comput. Syst..

[14]  Dietrich Rebholz-Schuhmann,et al.  Annotation and Disambiguation of Semantic Types in Biomedical Text: A Cascaded Approach to Named Entity Recognition , 2006, NLPXML@EACL.

[15]  Anees Shaikh,et al.  A Cost-Aware Elasticity Provisioning System for the Cloud , 2011, 2011 31st International Conference on Distributed Computing Systems.

[16]  Alfonso Valencia,et al.  iHOP Web Services Family , 2010, JBI.

[17]  Jian Wang,et al.  Towards enabling Cyberinfrastructure as a Service in Clouds , 2013, Comput. Electr. Eng..

[18]  Oscar Corcho,et al.  Active Ontology: An Information Integration Approach for Dynamic Information Sources , 2007, Grid 2007.

[19]  Ioannis Konstantinou,et al.  On Controlling Elasticity of Cloud Applications in CELAR , 2015 .

[20]  Kun Wang,et al.  A Distributed Self-Learning Approach for Elastic Provisioning of Virtualized Cloud Resources , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[21]  Hugo Y. K. Lam,et al.  Personal Omics Profiling Reveals Dynamic Molecular and Medical Phenotypes , 2012, Cell.