Big Data Management in Digital Forensics

The past few years have witnessed an exponential growth in the volume of data on digital forensics leading to big data issues. Digital forensics data is complex and heterogeneous in that it can be structured, unstructured and semi-structured. Traditional relational database management systems (RDBMS) typically expose a query interface based on SQL (Structured Query Language). However, the RDBMS are mainly employed for management of structured data and hard to scale out to the ever growing size of data sets. This paper reviews the features of NoSQL (Not Only SQL) database technologies as an alternative to RDBMS for management of Big Data. It evaluates the performance of a RDBMS (i.e. MySQL) in comparison with two NoSQL database systems (i.e. Mongo DB and Riak).

[1]  Devarshi Ghoshal,et al.  I/O performance of virtualized cloud environments , 2011, DataCloud-SC '11.

[2]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[3]  David K. Gifford,et al.  Weighted voting for replicated data , 1979, SOSP '79.

[4]  Robert H. Thomas,et al.  A Majority consensus approach to concurrency control for multiple copy databases , 1979, ACM Trans. Database Syst..

[5]  E. Brewer,et al.  CAP twelve years later: How the "rules" have changed , 2012, Computer.

[6]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[7]  Patrick Valduriez,et al.  Principles of Distributed Database Systems, Third Edition , 2011 .

[8]  Werner Vogels,et al.  Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability. , 2022 .

[9]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[10]  Eric A. Brewer,et al.  Harvest, yield, and scalable tolerant systems , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[11]  Miloš Djurdjević Harvest , 2013 .

[12]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[13]  Alessandro Guarino,et al.  Digital Forensics as a Big Data Challenge , 2013, ISSE.

[14]  Sherif Sakr,et al.  Application-Managed Database Replication on Virtualized Cloud Environments , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[15]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[16]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[17]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[18]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[19]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  Werner Vogels,et al.  Eventually consistent , 2008, CACM.

[22]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.