Evaluating the Cassandra NoSQL Database Approach for Genomic Data Persistency

Rapid advances in high-throughput sequencing techniques have created interesting computational challenges in bioinformatics. One of them refers to management of massive amounts of data generated by automatic sequencers. We need to deal with the persistency of genomic data, particularly storing and analyzing these large-scale processed data. To find an alternative to the frequently considered relational database model becomes a compelling task. Other data models may be more effective when dealing with a very large amount of nonconventional data, especially for writing and retrieving operations. In this paper, we discuss the Cassandra NoSQL database approach for storing genomic data. We perform an analysis of persistency and I/O operations with real data, using the Cassandra database system. We also compare the results obtained with a classical relational database system and another NoSQL database approach, MongoDB.

[1]  Toby Bloom,et al.  Managing Data from High-Throughput Genomic Processing: A Case Study , 2004, VLDB.

[2]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[4]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[5]  José A. Blakeley,et al.  Data Management for High-Throughput Genomics , 2009, CIDR.

[6]  J. Zhai,et al.  Short-read sequencing technologies for transcriptional analyses. , 2009, Annual review of plant biology.

[7]  Jeff Carpenter,et al.  Cassandra: The Definitive Guide , 2010 .

[8]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[9]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[10]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[11]  Prashant Pandey,et al.  Cloud computing , 2010, ICWET.

[12]  Cristian Bucur,et al.  A comparison between several NoSQL databases with comments and notes , 2011, 2011 RoEduNet International Conference 10th Edition: Networking in Education and Research.

[13]  Shanping Li,et al.  A Request Skew Aware Heterogeneous Distributed Storage System Based on Cassandra , 2011, 2011 International Conference on Computer and Management (CAMAN).

[14]  Yousaf Muhammad Evaluation and Implementation of Distributed NoSQL Database for MMO Gaming Environment , 2011 .

[15]  Stefan Jablonski,et al.  NoSQL evaluation: A use case oriented survey , 2011, 2011 International Conference on Cloud and Service Computing.

[16]  Rabi Prasad Padhy,et al.  RDBMS to NoSQL: Reviewing Some Next-Generation Non-Relational Database's , 2011 .

[17]  Jianfeng Tang,et al.  The NoSQL Principles and Basic Application of Cassandra Model , 2012, 2012 International Conference on Computer Science and Service System.

[18]  Clarence J M Tauro,et al.  Comparative Study of the New Generation, Agile, Scalable, High Performance NOSQL Databases , 2012 .

[19]  M. Indrawan Database Research: Are We at a Crossroad? Reflection on NoSQL , 2012, NBiS.

[20]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[21]  David Bermbach,et al.  A Runtime Quality Measurement Framework for Cloud Database Service Systems , 2012, 2012 Eighth International Conference on the Quality of Information and Communications Technology.

[22]  Sathiamoorthy Manoharan,et al.  A performance comparison of SQL and NoSQL databases , 2013, 2013 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[23]  Rinkle Rani,et al.  Modeling and querying data in NoSQL databases , 2013, 2013 IEEE International Conference on Big Data.

[24]  Vijay Parthasarathy Learning Cassandra for Administrators , 2013 .

[25]  Ying-Chih Lin,et al.  Enabling Large-Scale Biomedical Analysis in the Cloud , 2013, BioMed research international.

[26]  Che-Lun Hung,et al.  Local Alignment Tool Based on Hadoop Framework and GPU Architecture , 2014, BioMed research international.

[27]  Aleksandra Werner,et al.  Standardization of NoSQL Database Languages , 2014, BDAS.