Virtual technology on CPSE-Bio

Explosive growth of biomedical data in various platforms presents a major challenge for bioinformatics research. To analyze the ever-increasing amount of data, it requires unprecedented storage resources and computing power. To maximize the limited resources for big data, we propose a model on Bioinformatics Problem Solving Environment based on Cloud Computing (CPSE-Bio). The proposed technique combines the Hadoop with virtualization technology, allowing each core on multicore computers to be virtualized as a computing node in the Hadoop cluster to take advantage of a common feature of modern processor with multicores. Adding virtualization creates more effective computing nodes in the cluster than the one without, and analysis and experiments demonstrate that our model enhances the resource utilization and computing efficiency in CPSE-Bio.

[1]  Mo Mu PDE.Mart: A network-based problem-solving environment for PDEs , 2005, TOMS.

[2]  Wu Zhang,et al.  Web Services Enabled Text Categorization System: Service Infrastructure Designing , 2007 .

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[6]  Andrew Warfield,et al.  Xen and the art of virtualization , 2003, SOSP '03.

[7]  Haruki Nakamura,et al.  The Protein Data Bank at 40: reflecting on the past to prepare for the future. , 2012, Structure.

[8]  Ioannis Xenarios,et al.  DIP: The Database of Interacting Proteins: 2001 update , 2001, Nucleic Acids Res..

[9]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[10]  Xiaowei Liu,et al.  Multiple-Job Optimization in MapReduce for Heterogeneous Workloads , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[11]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[12]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[13]  James P. Hennessy,et al.  Multiple Operating Systems on One Processor Complex , 1989, IBM Syst. J..

[14]  George Neville-Neil,et al.  The Design and Implementation of the FreeBSD Operating System , 2014 .

[15]  Andrew Warfield,et al.  Safe Hardware Access with the Xen Virtual Machine Monitor , 2007 .

[16]  Jiang Xie,et al.  An Integrated Computing Environment for Bio-Molecular Networks , 2010, J. Convergence Inf. Technol..

[17]  Wu Zhang,et al.  Data Management and Application on CPSE-Bio , 2011 .

[18]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[19]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[20]  E. Gallopoulos,et al.  Computer as thinker/doer: problem-solving environments for computational science , 1994, IEEE Computational Science and Engineering.

[21]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..