Long-term preservation of big data: prospects of current storage technologies in digital libraries

The purpose of this paper is to investigate the prospects of current storage technologies for long-term preservation of big data in digital libraries.,The study employs a systematic and critical review of the relevant literature to explore the prospects of current storage technologies for long-term preservation of big data in digital libraries. Online computer databases were searched to identify the relevant literature published between 2000 and 2016. A specific inclusion and exclusion criterion was formulated and applied in two distinct rounds to determine the most relevant papers.,The study concludes that the current storage technologies are not viable for long-term preservation of big data in digital libraries. They can neither fulfil all the storage demands nor alleviate the financial expenditures of digital libraries. The study also points out that migrating to emerging storage technologies in digital libraries is a long-term viable solution.,The study suggests that continuous innovation and research efforts in current storage technologies are required to lessen the impact of storage shortage on digital libraries, and to allow emerging storage technologies to advance further and take over. At the same time, more aggressive research and development efforts are required by academics and industry to further advance the emerging storage technologies for their timely and swift adoption by digital libraries.,The study reveals that digital libraries, besides incurring significant financial expenditures, will suffer from potential loss of information due to storage shortage for long-term preservation of big data, if current storage technologies are employed by them. Therefore, policy makers and practitioners should meticulously choose storage technologies for long-term preservation of big data in digital libraries.,This type of holistic study that investigates the prospects of magnetic drive technology, solid-state drive technology, and data-reduction techniques for long-term preservation of big data in digital libraries has not been conducted in the field previously, and so provides a novel contribution. The study arms academics, practitioners, policy makers, and industry with the deep understanding of the problem, technical details to choose storage technologies meticulously, greater insight to frame sustainable policies, and opportunities to address various research problems.

[1]  Erez Zadok,et al.  Energy and performance evaluation of lossless file data compression on server systems , 2009, SYSTOR '09.

[2]  Arif Merchant,et al.  Janus: Optimal Flash Provisioning for Cloud Storage Workloads , 2013, USENIX Annual Technical Conference.

[3]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[4]  M. Kitsuregawa,et al.  The History of Storage Systems , 2012, Proceedings of the IEEE.

[5]  Michael Seadle Managing and mining historical research data , 2016, Libr. Hi Tech.

[6]  Raghunath Othayoth Nambiar,et al.  Energy cost, the key challenge of today's data centers: a power consumption analysis of TPC-C results , 2008, Proc. VLDB Endow..

[7]  Emmanuel Adjei,et al.  Digital preservation: The conduit through which open data, electronic government and the right to information are implemented , 2016, Libr. Hi Tech.

[8]  Bin Wang,et al.  Quality of service aware power management for virtualized data centers , 2013, J. Syst. Archit..

[9]  Min Gu,et al.  Optical storage arrays: a perspective for future big data storage , 2014, Light: Science & Applications.

[10]  C. Walter Kryder's law. , 2005, Scientific American.

[11]  S. M. K. Quadri,et al.  Performance Augmentation of a FAT Filesystem by a Hybrid Storage System , 2014 .

[12]  Edward Grochowski,et al.  Technological impact of magnetic hard disk drives on storage systems , 2003, IBM Syst. J..

[13]  Zhen He,et al.  A hybrid filesystem for hard disk drives in tandem with flash memory , 2011, Computing.

[14]  H. Iwasaki,et al.  Future Options for HDD Storage , 2009, IEEE Transactions on Magnetics.

[15]  Lin He,et al.  Reuse of scientific data in academic publications: An investigation of Dryad Digital Repository , 2016, Aslib J. Inf. Manag..

[16]  Matei Ripeanu,et al.  Assessing data deduplication trade-offs from an energy and performance perspective , 2011, 2011 International Green Computing Conference and Workshops.

[17]  Hong Jiang,et al.  Leveraging Data Deduplication to Improve the Performance of Primary Storage Systems in the Cloud , 2016, IEEE Transactions on Computers.

[18]  Yan Han Cloud storage for digital preservation: optimal uses of Amazon S3 and Glacier , 2015, Libr. Hi Tech.

[19]  Wondwossen M. Beyene Metadata and universal access in digital library environments , 2017, Libr. Hi Tech.

[20]  Bogdan Nicolae,et al.  On the Benefits of Transparent Compression for Cost-Effective Cloud Data Storage , 2011, Trans. Large Scale Data Knowl. Centered Syst..

[21]  Ken Anderson,et al.  Holographic data storage: science fiction or science fact? , 2014, Optics & Photonics - Optical Engineering + Applications.

[22]  Dutch T. Meyer,et al.  A study of practical deduplication , 2011, TOS.

[23]  Antony I. T. Rowstron,et al.  Migrating server storage to SSDs: analysis of tradeoffs , 2009, EuroSys '09.

[24]  Tao Li,et al.  Characterizing the efficiency of data deduplication for big data storage management , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[25]  Brian Jones,et al.  An Analysis of Hard Drive Energy Consumption , 2008, 2008 IEEE International Symposium on Modeling, Analysis and Simulation of Computers and Telecommunication Systems.

[26]  Jonathan Foster,et al.  Towards an understanding of data work in context: Emerging issues of economy, governance, and ethics , 2016, Libr. Hi Tech.

[27]  Robert J. T. Morris,et al.  The evolution of storage systems , 2003, IBM Syst. J..

[28]  Yonggang Wen,et al.  Data Center Energy Consumption Modeling: A Survey , 2016, IEEE Communications Surveys & Tutorials.

[29]  S. M. K. Quadri,et al.  Big Data promises value: is hardware technology taken onboard? , 2015, Ind. Manag. Data Syst..

[30]  Joseph F. Murray,et al.  Improved disk-drive failure warnings , 2002, IEEE Trans. Reliab..

[31]  Olle Heinonen,et al.  Recording potential of bit-patterned media , 2006 .

[32]  Amip J. Shah,et al.  Assessing the environmental impact of data centres part 1: Background, energy use and metrics , 2014 .

[33]  David S. H. Rosenthal The medium-term prospects for long-term storage systems , 2017, Libr. Hi Tech.

[34]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[35]  A.S. Hoagland,et al.  History of magnetic disk storage based on perpendicular magnetic recording , 2003, Joint NAPMRC 2003. Digest of Technical Papers.

[36]  Yaniv Erlich,et al.  DNA Fountain enables a robust and efficient storage architecture , 2016, Science.

[37]  Mahadev Satyanarayanan,et al.  Opportunistic Use of Content Addressable Storage for Distributed File Systems , 2003, USENIX Annual Technical Conference, General Track.

[38]  Akira Kikitsu,et al.  Prospects for bit patterned media for high-density magnetic recording , 2009 .

[39]  Patrick Ngulube,et al.  Preserving the digital heritage of public institutions in Ghana in the wake of electronic government , 2016, Libr. Hi Tech.

[40]  M.H. Kryder,et al.  After Hard Drives—What Comes Next? , 2009, IEEE Transactions on Magnetics.

[41]  Sang-Won Lee,et al.  Design of flash-based DBMS: an in-page logging approach , 2007, SIGMOD '07.