Big data emerging technologies: A CaseStudy with analyzing twitter data using apache hive

These are the days of Growth and Innovation for a better future. Now-a-days companies are bound to realize need of Big Data to make decision over complex problem. Big Data is a term that refers to collection of large datasets containing massive amount of data whose size is in the range of Petabytes, Zettabytes, or with high rate of growth, and complexity that make them difficult to process and analyze using conventional database technologies. Big Data is generated from various sources such as social networking sites like Facebook, Twitter etc, and the data that is generated can be in various formats like structured, semi-structured or unstructured format. For extracting valuable information from this huge amount of Data, new tools and techniques is a need of time for the organizations to derive business benefits and to gain competitive advantage over the market. In this paper a comprehensive study of major Big Data emerging technologies by highlighting their important features and how they work, with a comparative study between them is presented. This paper also represents performance analysis of Apache Hive query for executing Twitter tweets in order to calculate Map Reduce CPU time spent and total time taken to finish the job.

[1]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[2]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[3]  Hai Jin,et al.  Evaluating MapReduce on Virtual Machines: The Hadoop Case , 2009, CloudCom.

[4]  Karthik Ranganathan,et al.  Apache hadoop goes realtime at Facebook , 2011, SIGMOD '11.

[5]  Eleni Stroulia,et al.  Federating Web-Based Applications on a Hierarchical Cloud , 2014, 2014 IEEE 7th International Conference on Cloud Computing.

[6]  Vanita Jain,et al.  Stock market prediction using Hadoop Map-Reduce ecosystem , 2015, 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom).

[7]  Ankur Gupta,et al.  Towards efficient Big Data and data analytics: A review , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[8]  GaniAbdullah,et al.  The rise of "big data" on cloud computing , 2015 .

[9]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[10]  Kenli Li,et al.  A self-adaptive scheduling algorithm for reduce start time , 2015, Future Gener. Comput. Syst..

[11]  Shuqiang Yang,et al.  Optimization of task assignment strategy for map-reduce , 2012, Proceedings of 2012 2nd International Conference on Computer Science and Network Technology.

[12]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[13]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[14]  Rahul Beakta Big Data And Hadoop: A Review Paper , 2015 .

[15]  Achim Streit,et al.  MapReduce across Distributed Clusters for Data-intensive Applications , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[16]  Amruta Deshpande,et al.  Log Mining Based on Hadoop's Map and Reduce Technique , 2013 .

[17]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[18]  P. Venkata Krishna,et al.  Learning Automata Based Sentiment Analysis for recommender system on cloud , 2013, 2013 International Conference on Computer, Information and Telecommunication Systems (CITS).

[19]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[20]  Sanjay Agrawal,et al.  An experimental approach towards big data for analyzing memory utilization on a hadoop cluster using HDFS and MapReduce , 2014, 2014 First International Conference on Networks & Soft Computing (ICNSC2014).

[21]  Jaskaran Singh,et al.  Big Data: Tools and Technologies in Big Data , 2015 .

[22]  Ganeshayya Shidaganti,et al.  Feedback analysis of unstructured data from collabrative networking a BigData analytics approach , 2014, International Conference on Circuits, Communication, Control and Computing.

[23]  Yun Tian,et al.  Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[24]  Eleni Stroulia,et al.  A three-dimensional data model in HBase for large time-series dataset analysis , 2012, 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA).

[25]  Rabi Prasad Padhy Big Data Processing with Hadoop-MapReduce in Cloud Systems , 2012, CloudCom 2012.

[26]  Tomás F. Pena,et al.  Perldoop: Efficient execution of Perl scripts on Hadoop clusters , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[27]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[28]  Zibin Zheng,et al.  Service-Generated Big Data and Big Data-as-a-Service: An Overview , 2013, 2013 IEEE International Congress on Big Data.

[29]  Ling Liu,et al.  Cost-Effective Resource Provisioning for MapReduce in a Cloud , 2015, IEEE Transactions on Parallel and Distributed Systems.

[30]  Mayank Bansal,et al.  Astro: A predictive model for anomaly detection and feedback-based scheduling on Hadoop , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[31]  M. Vijayalakshmi,et al.  Big Data analytics frameworks , 2014, 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA).

[32]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[33]  Punam Bedi,et al.  Beginning with big data simplified , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[34]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[35]  Debarchan Sarkar Understanding Windows Azure HDInsight Service , 2014 .

[36]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[37]  Calton Pu,et al.  Performance Overhead among Three Hypervisors: An Experimental Study Using Hadoop Benchmarks , 2013, 2013 IEEE International Congress on Big Data.

[38]  Mao Lin Huang,et al.  5Ws Model for Big Data Analysis and Visualization , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[39]  J Thriveni,et al.  Big Data Analytics: An Approach using Hadoop Distributed File System , 2014 .

[40]  M. N. Vora,et al.  Hadoop-HBase for large-scale data , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[41]  Keqin Li,et al.  A task-level adaptive MapReduce framework for real-time streaming data in healthcare applications , 2015, Future Gener. Comput. Syst..

[42]  Ramlal Naik A Big Data Hadoop Architecture for Online Analysis , 2014 .

[43]  Sungyoung Lee,et al.  Precise tweet classification and sentiment analysis , 2013, 2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS).