Comparison of Map-Reduce and SQL on Large-Scale Data Processing

Popularity for the term ‘Cloud-Computing’ has been increasing in recent years. There are many great companies such as Yahoo, Google etc. tried to provide related services to business community, even through public users. In addition to the SQL technique, Map-Reduce, a programming model that realizes implementing large-scale data processing, has been a hot topic that is widely discussed through many studies. Many real-world tasks such as data processing for search engines can be parallel-implemented through a simple interface with two functions called Map and Reduce. In this paper, we focus on comparing the performance of the Hadoop implementation of Map-Reduce with SQL Server though simulations. In our studies, Hadoop can complete the same query faster than a SQL Server. On the other hand, some concerned factors are also tested to see whether they would affect the performance for Hadoop or not. We also find that more machines included for data processing can make Hadoop achieve a better performance, especially for a large-scale data set.

[1]  S. Habib,et al.  Introducing map-reduce to high end computing , 2008, 2008 3rd Petascale Data Storage Workshop.

[2]  Charles R. Fletcher,et al.  IEEE International Professional Communication Conference , 2007 .

[3]  James L. Johnson SQL in the Clouds , 2009, Computing in Science & Engineering.

[4]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[5]  Lauren Wood 技術解説 IEEE Internet Computing , 1999 .

[6]  Philip J. Hills,et al.  International Journal of Information Management , 2006, Int. J. Inf. Manag..

[7]  Tobias Weber,et al.  Joint Transmission with Significant CSI in the Downlink of Distributed Antenna Systems , 2009, 2009 IEEE International Conference on Communications.

[8]  Rajkumar Buyya,et al.  Harnessing Cloud Technologies for a Virtualized Distributed Computing Infrastructure , 2009, IEEE Internet Computing.

[9]  Rich Maggiani Cloud computing is changing how we communicate , 2009, 2009 IEEE International Professional Communication Conference.

[10]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[11]  Nabil Sultan,et al.  loud computing for education : A new dawn ? , 2009 .

[12]  Jenq-Shiou Leu A lightweight brokering system for content/service charging in a cellular network centric business model , 2008, Comput. Commun..

[13]  Kirk P. Arnett,et al.  The size of the IT job market , 2008, CACM.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[15]  Beng Chin Ooi,et al.  Proceedings of the 2007 ACM SIGMOD international conference on Management of data , 2007, SIGMOD 2007.

[16]  Wei Jiang,et al.  Comparing map-reduce and FREERIDE for data-intensive applications , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[17]  Luqun Li,et al.  An Optimistic Differentiated Service Job Scheduling System for Cloud Computing Service Users and Providers , 2009, 2009 Third International Conference on Multimedia and Ubiquitous Engineering.

[18]  Tian Xia Large-Scale SMS Messages Mining Based on Map-Reduce , 2008, 2008 International Symposium on Computational Intelligence and Design.

[19]  Ieee Xplore Computing in science & engineering , 1999 .