Perspective of Database Services for Managing Large-Scale Data on the Cloud: A Comparative Study

The influx of Big Data on the Internet has become a question for many businesses of how they can benefit from big data and how to use cloud computing to make it happen. The magnitude at which data is getting generated day by day is hard to believe and is beyond the scope of a human’s capability to view and analyze it and hence there is an imperative need for data management and analytical tools to leverage this big data. Companies require a fine blend of technologies to collect, analyze, visualize, and process large volume of data. Big Data initiatives are driving urgent demand for algorithms to process data, accentuating challenges around data security with minimal impact on existing systems. In this paper, we present many existing cloud storage systems and query processing techniques to process the large scale data on the cloud. The paper also explores the challenges of big data management on the cloud and related factors that encourage the research work in this field.

[1]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[2]  Carlo Curino,et al.  Relational Cloud: a Database Service for the cloud , 2011, CIDR.

[3]  Jörg Schwenk,et al.  On Technical Security Issues in Cloud Computing , 2009, 2009 IEEE International Conference on Cloud Computing.

[4]  Ioannis Konstantinou,et al.  On the elasticity of NoSQL databases over cloud management platforms , 2011, CIKM '11.

[5]  Mark Graham,et al.  Geography and the future of big data, big data and the future of geography , 2013 .

[6]  Mayank Singh,et al.  Ontology Based Information Retrieval in Semantic Web: A Survey , 2013 .

[7]  Miriam A. M. Capretz,et al.  Data management in cloud environments: NoSQL and NewSQL data stores , 2013, Journal of Cloud Computing: Advances, Systems and Applications.

[8]  Ian Horrocks,et al.  Distributed Query Processing on the Cloud: the Optique Point of View (Short Paper) , 2013, OWLED.

[9]  Laura M. Haas,et al.  Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[10]  Li Xiong,et al.  Dynamic Query Processing for P2P Data Services in the Cloud , 2009, DEXA.

[11]  Nigel Ellis,et al.  Extreme scale with full SQL language support in microsoft SQL Azure , 2010, SIGMOD Conference.

[12]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[13]  Zoran Zdravev,et al.  Big data for education data mining, data analytics and web dashboards , 2015 .

[14]  Tore Risch,et al.  Querying combined cloud-based and relational databases , 2011, 2011 International Conference on Cloud and Service Computing.

[15]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[16]  Eve S. McCulloch Harnessing the Power of Big Data in Biological Research , 2013 .

[17]  Michael Zakharyaschev,et al.  Ontology-Based Data Access: Ontop of Databases , 2013, SEMWEB.

[18]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[19]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[20]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[21]  Darrell M. West,et al.  Big Data for Education: Data Mining, Data Analytics, and Web Dashboards. Governance Studies at Brookings. , 2012 .

[22]  Jimmy J. Lin MapReduce is Good Enough? If All You Have is a Hammer, Throw Away Everything That's Not a Nail! , 2012, Big Data.

[23]  Vishal Jain,et al.  Ontology Development and Query Retrieval using Protégé Tool , 2013 .

[24]  Jingren Zhou,et al.  SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..

[25]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[26]  Christos Doulkeridis,et al.  A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.

[27]  Patrick Valduriez,et al.  Principles of Distributed Database Systems, Third Edition , 2011 .

[28]  Charles Severance Using Google App Engine , 2009 .

[29]  Beng Chin Ooi,et al.  ES2: A cloud data storage system for supporting both OLTP and OLAP , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[30]  Carsten Lutz,et al.  The Combined Approach to Ontology-Based Data Access , 2011, IJCAI.

[31]  Ayman I. Kayssi,et al.  Privacy as a Service: Privacy-Aware Data Storage and Processing in Cloud Computing Architectures , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[32]  Daniel J. Abadi,et al.  Data Management in the Cloud: Limitations and Opportunities , 2009, IEEE Data Eng. Bull..

[33]  Pangfeng Liu,et al.  SQLMR : A Scalable Database Management System for Cloud Computing , 2011, 2011 International Conference on Parallel Processing.

[34]  Gang Chen,et al.  Providing Scalable Database Services on the Cloud , 2010, WISE.

[35]  Patrick Valduriez,et al.  Principles of Distributed Database Systems, Second Edition , 1999 .

[36]  Dinkar Sitaram,et al.  Platform as a Service , 2012, CloudCom 2012.

[37]  Patrick Valduriez,et al.  Parallel database systems: Open problems and new issues , 1993, Distributed and Parallel Databases.

[38]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[39]  Hans-Peter Kriegel,et al.  Future trends in data mining , 2007, Data Mining and Knowledge Discovery.

[40]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[41]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[42]  Lokanatha C. Reddy,et al.  A Review on Data mining from Past to the Future , 2011 .