论文信息 - SQLMR : A Scalable Database Management System for Cloud Computing

SQLMR : A Scalable Database Management System for Cloud Computing

As the size of data set in cloud increases rapidly, how to process large amount of data efficiently has become a critical issue. MapReduce provides a framework for large data processing and is shown to be scalable and fault-tolerant on commondity machines. However, it has higher learning curve than SQL-like language and the codes are hard to maintain and reuse. On the other hand, traditional SQL-based data processing is familiar to user but is limited in scalability. In this paper, we propose a hybrid approach to fill the gap between SQL-based and MapReduce data processing. We develop a data management system for cloud, named SQLMR. SQLMR complies SQL-like queries to a sequence of MapReduce jobs. Existing SQL-based applications are compatible seamlessly with SQLMR and users can manage Tera to PataByte scale of data with SQL-like queries instead of writing MapReduce codes. We also devise a number of optimization techniques to improve the performance of SQLMR. The experiment results demonstrate both performance and scalability advantage of SQLMR compared to MySQL and two NoSQL data processing systems, Hive and HadoopDB.

Pangfeng Liu | Jan-Jan Wu | Meng-Ju Hsieh | Chao-Rui Chang | Li-Yung Ho

[1] Alekh Jindal,et al. Hadoop++ , 2010 .

[2] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Pangfeng Liu,et al. Optimal Algorithms for Cross-Rack Communication Optimization in MapReduce Framework , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[5] GhemawatSanjay,et al. The Google file system , 2003 .

[6] Abraham Silberschatz,et al. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[7] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[8] John Coggeshall,et al. The MySQL Database , 2009 .

[9] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[10] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.