Review: Big Data Techniques of Google, Amazon, Facebook and Twitter

Google, Amazon, Facebook and Twitter gained enormous advantages from big data methodologies and techniques. There are certain unanswered questions regarding the process of big data, however, not much research has been undertaken in this area yet. This review will perform a comparative analysis based on big data techniques obtained from sixteen peer-reviewed scientific publications (2007-2015) about social media companies such as Google, Amazon, Facebook and Twitter to undertake a comparative analysis. Google has invented many techniques by using big data methods to strategize against competitors. Google, Facebook, Amazon and Twitter are partially similar companies that use big data despite their own business model requirements. As an illustration, Google required the data “ware housing” approach to store trillion of data related to Facebook, since Facebook owns more than one billion users and Twitter owns 300 million active users correspondingly equally to Amazon. Since all these organization required data ware house approach, Google has preferred the variation of data ware house storages (Spanner, Photon, Fusion table) variation of data transaction methods. By using these data ware house storage approaches (F1 for execute queries via SQL) and communication of different approached such as, Yedalog. Facebook and Twitter are both the only social media companies that have different requirements. The requirement of big data is high and these entire requirements partially depend on each another as it is completely isolated. This study is a useful reference for many researchers to identify the differences of big data approaches and technological analysis in comparison to Google, Facebook, Twitter and Amazon big data techniques and outline their, variations and similarities analysis.

[1]  Alexander Hall,et al.  Processing a Trillion Cells per Mouse Click , 2012, Proc. VLDB Endow..

[2]  GhemawatSanjay,et al.  The Google file system , 2003 .

[3]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[4]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[5]  Chuang Liu,et al.  The Unified Logging Infrastructure for Data Analytics at Twitter , 2012, Proc. VLDB Endow..

[6]  Jure Leskovec,et al.  Can cascades be predicted? , 2014, WWW.

[7]  Ian Rae,et al.  Online, Asynchronous Schema Change in F1 , 2013, Proc. VLDB Endow..

[8]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[9]  Fernando Pereira,et al.  Yedalog: Exploring Knowledge at Scale , 2015, SNAPL.

[10]  Jeffrey D. Ullman,et al.  Storing and Querying Tree-Structured Records in Dremel , 2014, Proc. VLDB Endow..

[11]  Malka N. Halgamuge,et al.  Threat analysis of portable hack tools from USB storage devices and protection solutions , 2010, 2010 International Conference on Information and Emerging Technologies.

[12]  Ashish Gupta,et al.  High-Availability at Massive Scale: Building Google's Data Infrastructure for Ads , 2015, BIRTE.

[13]  Brian F. Cooper Spanner: Google's globally-distributed database , 2013, SYSTOR '13.

[14]  Malka N. Halgamuge,et al.  Review: An evaluation of major threats in cloud computing associated with big data , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[15]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[16]  John Allen,et al.  Scuba: Diving into Data at Facebook , 2013, Proc. VLDB Endow..

[17]  Malka N. Halgamuge,et al.  The much needed security and data reforms of cloud computing in medical data storage , 2018 .

[18]  Jayant Madhavan,et al.  Big Data Storytelling Through Interactive Maps , 2012, IEEE Data Eng. Bull..

[19]  Haifeng Jiang,et al.  Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.

[20]  Malka N. Halgamuge,et al.  A Comparative Study of Classification Algorithms using Data Mining: Crime and Accidents in Denver City the USA , 2016 .

[21]  Malka N. Halgamuge,et al.  Pentaho and Jaspersoft: A Comparative Study of Business Intelligence Open Source Tools Processing Big Data to Evaluate Performances , 2016 .

[22]  Malka N. Halgamuge,et al.  Universal serial bus based software attacks and protection solutions , 2011, Digit. Investig..