On Construction of Cluster and Grid Computing Platforms for Parallel Bioinformatics Applications

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments Grid and PC Cluster systems, multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.

[1]  Ivo F. Sbalzarini,et al.  Abstractions and Middleware for Petascale Computing and Beyond , 2010, Int. J. Distributed Syst. Technol..

[2]  Thomas Sterling,et al.  How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters 2nd Printing , 1999 .

[3]  Geoffrey C. Fox,et al.  Interoperable Web Services for Computational Portals , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[4]  H. Casanova,et al.  GridSpeed: a Web-based grid portal generation server , 2004, Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004..

[5]  Yong Meng Teo,et al.  Hierarchical Structured Peer-to-Peer Networks , 2010 .

[6]  Chuan Yi Tang,et al.  Parallel Shellsort Algorithm for Many-Core GPUs with CUDA , 2012, Int. J. Grid High Perform. Comput..

[7]  Nik Bessis,et al.  Defining Minimum Requirements of Inter-collaborated Nodes by Measuring the Weight of Node Interactions , 2010, 2010 International Conference on Complex, Intelligent and Software Intensive Systems.

[8]  Ian Foster,et al.  The Grid 2 - Blueprint for a New Computing Infrastructure, Second Edition , 1998, The Grid 2, 2nd Edition.

[9]  Ankur Gupta,et al.  Toward a Quality-of-Service Framework for Peer-to-Peer Applications , 2010, Int. J. Distributed Syst. Technol..

[10]  Nick Antonopoulos,et al.  A Semantic-Driven Adaptive Architecture for Large Scale P2P Networks , 2010, Int. J. Grid High Perform. Comput..

[11]  Ian Welch,et al.  Trust and Privacy in Grid Resource Auctions , 2009 .

[12]  Bernhard Mlecnik,et al.  Client-Server environment for high-performance gene expression data analysis , 2003, Bioinform..

[13]  Zlatko Trajanoski,et al.  ClusterControl: a web interface for distributing and monitoring bioinformatics applications on a Linux cluster , 2004, Bioinform..

[14]  Chao-Tung Yang,et al.  Design and implementation of a computational grid for bioinformatics , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[15]  Antonio Liotta,et al.  Handbook of Research on P2P and Grid Systems for Service-oriented Computing: Models, Methodologies a , 2010 .

[16]  Frank Z. Wang,et al.  Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications , 2009 .

[17]  Ian Foster,et al.  The Grid: A New Infrastructure for 21st Century Science , 2002 .

[18]  Heithem Abbes,et al.  Parallelization of Littlewood-Richardson Coefficients Computation and its Integration into the BonjourGrid Meta-Desktop Grid Middleware , 2011, Int. J. Grid High Perform. Comput..

[19]  Valentin Cristea,et al.  Bio-Inspired Techniques for Resources State Prediction in Large Scale Distributed Systems , 2011, Int. J. Distributed Syst. Technol..

[20]  Michael Allen,et al.  Parallel programming: techniques and applications using networked workstations and parallel computers , 1998 .

[21]  R.R. Joshi,et al.  GBTK: a toolkit for grid implementation of BLAST , 2004, Proceedings. Seventh International Conference on High Performance Computing and Grid in Asia Pacific Region, 2004..

[22]  Jih-Sheng Shen,et al.  Dynamic Reconfigurable Network-on-Chip Design - Innovations for Computational Processing and Communication , 2010 .

[23]  Antonio Puliafito,et al.  Credential Management Enforcement and Secure Data Storage in gLite , 2010, Int. J. Distributed Syst. Technol..

[24]  Victor Alessandrini,et al.  BioGRID - An European grid for molecular biology , 2002, Proceedings 11th IEEE International Symposium on High Performance Distributed Computing.

[25]  Adam Wierzbicki,et al.  Trust and Fairness Management in P2P and Grid Systems , 2010 .

[26]  Martin Vingron,et al.  TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing , 2002, Bioinform..

[27]  Michael Luck,et al.  On the use of agents in a BioInformatics grid , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[28]  Miguel A. Andrade-Navarro,et al.  Computational space reduction and parallelization of a new clustering approach for large groups of sequences , 1998, Bioinform..

[29]  Chao-Tung Yang,et al.  Implementation and evaluation of a Java based computational grid for bioinformatics applications , 2005, 19th International Conference on Advanced Information Networking and Applications (AINA'05) Volume 1 (AINA papers).

[30]  Chao-Tung Yang,et al.  Designing computing platform for BioGrid , 2005, Int. J. Comput. Appl. Technol..

[31]  Aïcha-Nabila Benharkat,et al.  Towards a More Scalable Schema Matching: A Novel Approach , 2010, Int. J. Distributed Syst. Technol..

[32]  Pierre Kuonen,et al.  MaGate: An Interoperable, Decentralized and Modular High-Level Grid Scheduler , 2010, Int. J. Distributed Syst. Technol..

[33]  Nik Bessis,et al.  A Next Generation Technology Victim Location and Low Level Assessment Framework for Occupational Disasters Caused by Natural Hazards , 2011, Int. J. Distributed Syst. Technol..

[34]  Chao-Tung Yang,et al.  On Design of Cluster and Grid Computing Environment Toolkit for Bioinformatics Applications , 2004, IWDC.

[35]  Sasu Tarkoma,et al.  Mobile Push for Converged Mobile Services: the Airline Scenario , 2010 .

[36]  Liam O'Brien,et al.  Applying Design of Experiments (DOE) to Performance Evaluation of Commercial Cloud Services , 2013, Int. J. Grid High Perform. Comput..

[37]  Fumikazu Konishi,et al.  OBIGrid: A New Computing Platform for Bioinformatics , 2002 .

[38]  Fernando Gehm Moraes,et al.  A NoC-based Infrastructure to Enable Dynamic Self Reconfigurable Systems , 2007, ReCoSoC.

[39]  Donald K. Berry,et al.  Parallel Implementation and Performance of FastDNAml - A Program for Maximum Likelihood Phylogenetic Inference , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[40]  Carole A. Goble,et al.  Automating experiments using semantic data in a bioinformatics grid , 2004, IEEE Intelligent Systems.

[41]  Radu Prodan,et al.  ZENTURIO: an experiment management system for cluster and Grid computing , 2002, Proceedings. IEEE International Conference on Cluster Computing.

[42]  Nik Bessis Technology Integration Advancements in Distributed Systems and Computing , 2012 .

[43]  Jack Dongarra,et al.  Handbook of Research on Scalable Computing Technologies , 2009 .

[44]  Aaron Quigley,et al.  Serendipity Reloaded: Fair Loading in Event-Based Messaging , 2010 .

[45]  Kuo-Bin Li,et al.  ClustalW-MPI: ClustalW analysis using distributed and parallel computing , 2003, Bioinform..

[46]  Tran Vu Pham,et al.  Grid, P2P and SOA Orchestration: An Integrated Application Architecture for Scientific Collaborations , 2010 .

[47]  Zahid Raza,et al.  A Replica Based Co-Scheduler (RBS) for Fault Tolerant Computational Grid , 2011 .

[48]  Ezugwu E. Absalom,et al.  A Generic Reference Architecture for Collaboratory Scientific Virtual Laboratory , 2013, Int. J. Grid High Perform. Comput..