Towards an Integrated Platform for Big Data Analysis

The amount of data in the world is expanding rapidly. Every day, huge amounts of data are created by scientific experiments, companies, and end users’ activities. These large data sets have been labeled as “Big Data”, and their storage, processing and analysis presents a plethora of new challenges to computer science researchers and IT professionals. In addition to efficient data management, additional complexity arises from dealing with semi-structured or unstructured data, and from time critical processing requirements. In order to understand these massive amounts of data, advanced visualization and data exploration techniques are required.

[1]  Kyle Banker,et al.  MongoDB in Action , 2011 .

[2]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[3]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[4]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[5]  Abdelkader Hameurlain,et al.  Transactions on Large-Scale Data- and Knowledge-Centered Systems I , 2009, Trans. Large-Scale Data- and Knowledge-Centered Systems.

[6]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[7]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[8]  Volker Markl,et al.  MapReduce and PACT - Comparing Data Parallel Programming Models , 2011, BTW.

[9]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[10]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[11]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[12]  Pete Warden,et al.  Big Data Glossary , 2011 .

[13]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[14]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[15]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[16]  Abdelkader Hameurlain,et al.  Transactions on Large-Scale Data- and Knowledge-Centered Systems XIV , 2014, Lecture Notes in Computer Science.

[17]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[18]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[19]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[22]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[23]  GhemawatSanjay,et al.  The Google file system , 2003 .

[24]  Peter J. Haas,et al.  Ricardo: integrating R and Hadoop , 2010, SIGMOD Conference.