A P2P Framework for Developing Bioinformatics Applications in Dynamic Cloud Environments

Bioinformatics is advanced from in-house computing infrastructure to cloud computing for tackling the vast quantity of biological data. This advance enables large number of collaborative researches to share their works around the world. In view of that, retrieving biological data over the internet becomes more and more difficult because of the explosive growth and frequent changes. Various efforts have been made to address the problems of data discovery and delivery in the cloud framework, but most of them suffer the hindrance by a MapReduce master server to track all available data. In this paper, we propose an alternative approach, called PRKad, which exploits a Peer-to-Peer (P2P) model to achieve efficient data discovery and delivery. PRKad is a Kademlia-based implementation with Round-Trip-Time (RTT) as the associated key, and it locates data according to Distributed Hash Table (DHT) and XOR metric. The simulation results exhibit that our PRKad has the low link latency to retrieve data. As an interdisciplinary application of P2P computing for bioinformatics, PRKad also provides good scalability for servicing a greater number of users in dynamic cloud environments.

[1]  Emilio Leonardi,et al.  Self-Chord: A Bio-Inspired P2P Framework for Self-Organizing Distributed Systems , 2010, IEEE/ACM Transactions on Networking.

[2]  Peter Druschel,et al.  Peer-to-peer systems , 2010, Commun. ACM.

[3]  Jon Crowcroft,et al.  A survey and comparison of peer-to-peer overlay network schemes , 2005, IEEE Communications Surveys & Tutorials.

[4]  Domenico Talia,et al.  P2P-MapReduce: Parallel data processing in dynamic Cloud environments , 2012, J. Comput. Syst. Sci..

[5]  Yannis Manolopoulos,et al.  ART: sub-logarithmic decentralized range query processing with probabilistic guarantees , 2010, PODC '10.

[6]  Guillaume Pierre,et al.  A survey of DHT security techniques , 2011, CSUR.

[7]  Stephen B Montgomery,et al.  An application of peer-to-peer technology to the discovery, use and assessment of bioinformatics programs , 2005, Nature Methods.

[8]  Mark Handley,et al.  A scalable content-addressable network , 2001, SIGCOMM 2001.

[9]  Ralf Steinmetz,et al.  Benchmarking Platform for Peer-to-Peer Systems (Benchmarking Plattform für Peer-to-Peer Systeme) , 2007, it Inf. Technol..

[10]  F. Luciani,et al.  Next generation deep sequencing and vaccine design: today and tomorrow , 2012, Trends in Biotechnology.

[11]  Ben Y. Zhao,et al.  Tapestry: a resilient global-scale overlay for service deployment , 2004, IEEE Journal on Selected Areas in Communications.

[12]  Schahram Dustdar,et al.  Data contracts for cloud-based data marketplaces , 2012, Int. J. Comput. Sci. Eng..

[13]  Jingfa Xiao,et al.  Bioinformatics clouds for big data manipulation , 2012, Biology Direct.

[14]  Schahram Dustdar,et al.  Quality-aware service-oriented data integration: requirements, state of the art and open challenges , 2012, SGMD.

[15]  Mika Ylianttila,et al.  Performance evaluation of a Kademlia-based communication-oriented P2P system under churn , 2010, Comput. Networks.

[16]  Peter J. Tonellato,et al.  Cloud computing for comparative genomics , 2010, BMC Bioinformatics.

[17]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[18]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[19]  Ralf Steinmetz,et al.  Benchmarking Platform for Peer-to-Peer Systems , 2007 .

[20]  Yuh-Jzer Joung,et al.  Chord2: A two-layer Chord for reducing maintenance overhead via heterogeneity , 2007, Comput. Networks.

[21]  Lin Liu,et al.  Comparison of Next-Generation Sequencing Systems , 2012, Journal of biomedicine & biotechnology.

[22]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[23]  Calton Pu,et al.  Scaling Group Communication Services with Self-adaptive and Utility-driven Message Routing , 2012, Mob. Networks Appl..

[24]  Kuan-Chou Lai,et al.  A scalable multi-attribute hybrid overlay for range queries on the cloud , 2012, Inf. Syst. Frontiers.

[25]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[26]  Taoufik En-Najjary,et al.  Long Term Study of Peer Behavior in the kad DHT , 2009, IEEE/ACM Transactions on Networking.

[27]  Gade Krishna,et al.  A scalable peer-to-peer lookup protocol for Internet applications , 2012 .

[28]  David Stuart Robertson,et al.  Peer-to-Peer Experimentation in Protein Structure Prediction: An Architecture, Experiment and Initial Results , 2006, GCCB.

[29]  Yoji Yamato,et al.  Kademlia based routing on locator-ID separated networks for new generation networks , 2013, Peer-to-Peer Netw. Appl..