Computing: A Comprehensive Review

application especially in the medical domain. NGS technology is being applied to cancer research promising greater understanding of carcinogenesis. However, as sequence capacity grows, algorithmic speed is becoming an important bottleneck. To understand these challenges we present a review on difficulties currently faced in discovering biomarkers. The review places the spotlight on sequencing technologies and the bottlenecks encountered in these pipelines. Cloud computing and high performance computing technologies such as Grid are summarized in tackling computa­ tional challenges presented by technologies and algorithms used in biomarker research.

[1]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[2]  Qi Liu,et al.  Next generation sequencing in cancer research and clinical application , 2013, Biological Procedures Online.

[3]  J Concato,et al.  Biomarkers for the diagnosis and risk stratification of acute kidney injury: a systematic review. , 2008, Kidney international.

[4]  C. Compton,et al.  Assessing the need for a standardized cancer HUman Biobank (caHUB): findings from a national survey with cancer researchers. , 2011, Journal of the National Cancer Institute. Monographs.

[5]  M. Cronin,et al.  A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. , 2004, The New England journal of medicine.

[6]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[7]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[8]  Kouros Owzar,et al.  permGPU: Using graphics processing units in RNA microarray association studies , 2010, BMC Bioinformatics.

[9]  G. Poste Bring on the biomarkers , 2011, Nature.

[10]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[11]  Cole Trapnell,et al.  Optimizing data intensive GPGPU computations for DNA sequence alignment , 2009, Parallel Comput..

[12]  Amnon Barak,et al.  The MOSIX multicomputer operating system for high performance cluster computing , 1998, Future Gener. Comput. Syst..

[13]  H. M. Faheem,et al.  Accelerating Pairwise DNA Sequence Alignment using the CUDA Compatible GPU , 2013 .

[14]  Andrey Tovchigrechko,et al.  Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[15]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[16]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[17]  Geoffrey C. Fox,et al.  IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID 1 Cloud Technologies for Bioinformatics Applications , 2022 .

[18]  Jingfa Xiao,et al.  Bioinformatics clouds for big data manipulation , 2012, Biology Direct.

[19]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[20]  Arie E. Kaufman,et al.  GPU Cluster for High Performance Computing , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[22]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[23]  Cornelia M van Duijn,et al.  Genome-based prediction of common diseases: advances and prospects. , 2008, Human molecular genetics.

[24]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[25]  Gunnar Rätsch,et al.  Optimal spliced alignments of short sequence reads , 2008, BMC Bioinformatics.

[26]  Xinan Yang,et al.  Computational prognostic indicators for breast cancer , 2014, Cancer management and research.

[27]  Hesham El-Rewini,et al.  Message Passing Interface (MPI) , 2005 .

[28]  Aviv Shachak,et al.  Barriers and enablers to the acceptance of bioinformatics tools: a qualitative study. , 2007, Journal of the Medical Library Association : JMLA.

[29]  John W. M. Martens,et al.  Proteomics Pipeline for Biomarker Discovery of Laser Capture Microdissected Breast Cancer Tissue , 2012, Journal of Mammary Gland Biology and Neoplasia.

[30]  R. Schaller,et al.  Moore's law: past, present and future , 1997 .

[31]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[32]  Siu-Ming Yiu,et al.  SOAP3: ultra-fast GPU-based parallel alignment tool for short reads , 2012, Bioinform..

[33]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[34]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[35]  Ayman Grada,et al.  Next-generation sequencing: methodology and application. , 2013, The Journal of investigative dermatology.

[36]  A. Frigessi,et al.  Principles and methods of integrative genomic analyses in cancer , 2014, Nature Reviews Cancer.

[37]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[38]  E. Diamandis,et al.  Cancer biomarkers: can we turn recent failures into success? , 2010, Journal of the National Cancer Institute.

[39]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[40]  N. Siva 1000 Genomes project , 2008, Nature Biotechnology.

[41]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[42]  Rodrigo Lopez,et al.  A new bioinformatics analysis tools framework at EMBL–EBI , 2010, Nucleic Acids Res..

[43]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[44]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[45]  Hanlee P. Ji,et al.  Genetic-based biomarkers and next-generation sequencing: the future of personalized care in colorectal cancer. , 2011, Personalized medicine.

[46]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[47]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[48]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[49]  Geoffrey C. Fox,et al.  Biomedical Case Studies in Data Intensive Computing , 2009, CloudCom.

[50]  J. Vamathevan,et al.  The application of next-generation sequencing technologies to drug discovery and development. , 2011, Drug discovery today.

[51]  Steven J. Plimpton,et al.  MapReduce in MPI for Large-scale graph algorithms , 2011, Parallel Comput..

[52]  D. Chan,et al.  Proteomics and bioinformatics approaches for identification of serum biomarkers to detect breast cancer. , 2002, Clinical chemistry.

[53]  R. Mayeux Biomarkers: Potential uses and limitations , 2004, NeuroRX.

[54]  M. Weiner,et al.  Cerebrospinal fluid and plasma biomarkers in Alzheimer disease , 2010, Nature Reviews Neurology.