Big data and analytics: issues, challenges, and opportunities

Big data refers to the large heterogeneous data, being generated at a brisk rate, which cannot be stored or analysed using conventional methods. Big data requires reliable, fast, and distributed storage and access of voluminous data for which several data storage and access mechanisms are evolving. The variety, or heterogeneity, of data requires developing models that allow for a meaningful integration of data existing in disparate data sources. The velocity or rapid rate with which Big data is generated requires real time storage and processing models. Further, veracity or trustworthiness of data poses a major challenge with regard to volume, variety and velocity. Big data analytics, which adds value to Big data, has opened many challenges that are data centric, architectural and analytics-related. This paper discusses the issues related to Big data and its analysis, its challenges and its opportunities.

[1]  Fatos Xhafa,et al.  Semantics, intelligent processing and services for big data , 2014, Future Gener. Comput. Syst..

[2]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[3]  Carlo Zaniolo,et al.  Relational languages and data models for continuous queries on sequences and data streams , 2011, TODS.

[4]  Mary Czerwinski,et al.  Interactions with big data analytics , 2012, INTR.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Xavier Amatriain,et al.  Mining large streams of user data for personalized recommendations , 2013, SKDD.

[7]  I. Song,et al.  Analytics over large-scale multidimensional data: the big data revolution! , 2011, DOLAP '11.

[8]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[9]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[10]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[11]  Srinath Srinivasa Big Data Analytics : First International Conference, BDA 2012, New Delhi, India, December 24-26, 2012. Proceedings , 2012 .

[12]  Michael Stonebraker,et al.  Database research: achievements and opportunities into the 1st century , 1996, SGMD.

[13]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[14]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[15]  Yu Cheng,et al.  GLADE: big data analytics made easy , 2012, SIGMOD Conference.

[16]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[17]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[18]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Michael Stonebraker,et al.  10 rules for scalable performance in 'simple operation' datastores , 2011, Commun. ACM.

[20]  Xiaodong Zhang,et al.  DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems , 2011, SOCC '11.

[21]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[22]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[23]  Sreenivas Gollapudi,et al.  Empowering authors to diagnose comprehension burden in textbooks , 2012, KDD.

[24]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[25]  Surajit Chaudhuri,et al.  What next?: a half-dozen data management research goals for big data and the cloud , 2012, PODS '12.

[26]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[27]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[28]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[29]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[30]  Kyuseok Shim,et al.  MapReduce Algorithms for Big Data Analysis , 2012, Proc. VLDB Endow..

[31]  Jeffrey D. Ullman,et al.  Map-reduce extensions and recursive queries , 2011, EDBT/ICDT '11.

[32]  Michael Stonebraker,et al.  Aurora: a new model and architecture for data stream management , 2003, The VLDB Journal.

[33]  Mukesh K. Mohania,et al.  Cloud Computing and Big Data Analytics: What Is New from Databases Perspective? , 2012, BDA.

[34]  Chris Douglas,et al.  Walnut: a unified cloud object store , 2012, SIGMOD Conference.

[35]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[36]  Bernhard Mitschang,et al.  Editorial to the special issue: “Trends and advances in database systems research” , 2011, Computer Science - Research and Development.

[37]  T. V. Vijay Kumar,et al.  Computing full disjunction using COJO , 2009, Inf. Technol. Manag..

[38]  Christos Faloutsos,et al.  Big graph mining: algorithms and discoveries , 2013, SKDD.

[39]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[40]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[41]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[42]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[43]  Xiaofeng Meng,et al.  An efficient multi-dimensional index for cloud data management , 2009, CloudDB@CIKM.

[44]  Vijay Srinivas Agneeswaran Big-Data - Theoretical, Engineering and Analytics Perspective , 2012, BDA.

[45]  Tilmann Rabl,et al.  Solving Big Data Challenges for Enterprise Application Performance Management , 2012, Proc. VLDB Endow..

[46]  Tilmann Rabl,et al.  Big data benchmarking , 2012 .

[47]  Jimmy J. Lin,et al.  Large-scale machine learning at twitter , 2012, SIGMOD Conference.

[48]  Daniel M. Batista,et al.  A Survey of Large Scale Data Management Approaches in Cloud Environments , 2011, IEEE Communications Surveys & Tutorials.