Genetic Mapping of Diseases through Big Data Techniques

The development of sophisticated sequencing machines and DNA techniques has enabled advances to be made in the medical field of genetics research. However, due to the large amount of data that sequencers produce, new methods and programs are required to allow an efficient and rapid analysis of the data. MapReduce is a data-intensive computing model that handles large volumes that are easy to program by means of two basic functions (Map and Reduce). This work introduces GMS, a genetic mapping system that can assist doctors in the clinical diagnosis of patients by conducting an analysis of the genetic mutations contained in their DNA. As a result, the model can offer a good method for analyzing the data generated by sequencers, by providing a scalable system that can handle a large amount of data. The use of several medical databases at the same time makes it possible to determine susceptibilities to diseases through big data analysis mechanisms. The results show scalability and offer a possible diagnosis that can improve the genetic diagnosis with a powerful tool for health professionals.

[1]  Roy D. Sleator,et al.  'Big data', Hadoop and cloud computing in genomics , 2013, J. Biomed. Informatics.

[2]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[3]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[4]  Chih-Wei Huang,et al.  CloudDOE: A User-Friendly Tool for Deploying Hadoop Clouds and Analyzing High-Throughput Sequencing Data with MapReduce , 2014, PloS one.

[5]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[6]  M M Hansen,et al.  Big Data in Science and Healthcare: A Review of Recent Literature and Perspectives , 2014, Yearbook of Medical Informatics.

[7]  Ben Langmead,et al.  Genotyping in the Cloud with Crossbow , 2012, Current protocols in bioinformatics.

[8]  Weisong Shi,et al.  CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping , 2011, BMC Research Notes.

[9]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[10]  John Parsch,et al.  INAUGURAL ARTICLE by a Recently Elected Academy Member:Prevalence of positive selection among nearly neutral amino acid replacements in Drosophila , 2007 .

[11]  Syed Haider,et al.  Ensembl BioMarts: a hub for data retrieval across taxonomic space , 2011, Database J. Biol. Databases Curation.

[12]  Fabrício F. Costa Big data in biomedicine. , 2014, Drug discovery today.

[13]  Eija Korpelainen,et al.  Hadoop-BAM: directly manipulating next generation sequencing data in the cloud , 2012, Bioinform..

[14]  Alex P. Reiner,et al.  Massively parallel sequencing: the new frontier of hematologic genomics. , 2013, Blood.

[15]  Thierry Frebourg,et al.  The Challenge for the Next Generation of Medical Geneticists , 2014, Human mutation.