Comparing High Level MapReduce Query Languages

The MapReduce parallel computational model is of increasing importance. A number of High Level Query Languages (HLQLs) have been constructed on top of the Hadoop MapReduce realization, primarily Pig, Hive, and JAQL. This paper makes a systematic performance comparison of these three HLQLs, focusing on scale up, scale out and runtime metrics. We further make a language comparison of the HLQLs focusing on conciseness and computational power. The HLQL development communities are engaged in the study, which revealed technical bottlenecks and limitations described in this document, and it is impacting their development.

[1]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[2]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[3]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[4]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[5]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Peter Buneman,et al.  Types and persistence in database programming languages , 1987, CSUR.

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[10]  Christopher Olston,et al.  Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[11]  Frederick Reiss,et al.  Towards a Scalable Enterprise Content Analytics Platform , 2009, IEEE Data Eng. Bull..

[12]  Tom White Hadoop - The Definitive Guide: MapReduce for the Cloud , 2009 .

[13]  Jens Dittrich,et al.  iMeMex: From Search to Information Integration and Back , 2009, IEEE Data Eng. Bull..