Managing and Analysing Genomic Data Using HPC and Clouds

Database management techniques using distributed processing services have evolved to address the issues of distributed, heterogeneous data collections held across dynamic, virtual organisations [1-3]. These techniques, originally developed for data grids in domains such as high-energy particle physics [4], have been adapted to make use of the emerging cloud infrastructures [5].

[1]  Erwin Laure,et al.  Performance engineering in data Grids , 2005, Concurr. Pract. Exp..

[2]  David Taniar,et al.  High-Performance Parallel Database Processing and Grid Databases: Taniar/High-Performance Parallel DP & Grid DB , 2008 .

[3]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[4]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[5]  Hao Yu,et al.  State of the Art in Parallel Computing with R , 2009 .

[6]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[7]  Jon Hill,et al.  SPRINT: A new parallel framework for R , 2008, BMC Bioinformatics.

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[10]  Michael Stonebraker,et al.  Requirements for Science Data Bases and SciDB , 2009, CIDR.

[11]  David Taniar,et al.  High Performance Parallel Database Processing and Grid Databases , 2008 .

[12]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[13]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[14]  Wei-Min Liu,et al.  Robust estimators for expression analysis , 2002, Bioinform..

[15]  Mario Antonioletti,et al.  Integrating distributed data sources with OGSA–DAI DQP and Views , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[16]  Suzanne J. Matthews,et al.  MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees , 2010, BMC Bioinformatics.

[17]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[18]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[19]  Anthony Finkelstein,et al.  Relating Requirements and Architectures: A Study of Data-Grids , 2004, Journal of Grid Computing.

[20]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .