Multiple big data processing platforms

The integration of Hive, Impala and Spark SQL platforms has achieved to perform rapid data retrieval using SQL query in big data environment. This paper is to design the optimized platform selection for highly improving the response of data retrieval. It can automatically choose the best-perform platform to best perform SQL commands. In addition, the distributed memory storage systems using Memcached and the distributed file system Hadoop HDFS have implemented the caching so that the fastest data retrieval has done once the repeated SQL command has applied.

[1]  Jayati The Berkeley Data Analytics Stack (BDAS) , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).

[2]  Bao Rong Chang,et al.  High-Performed Virtualization Services for In-Cloud Enterprise Resource Planning System , 2014, J. Inf. Hiding Multim. Signal Process..

[3]  Muhammad Aslam,et al.  Minimizing big data problems using cloud computing based on Hadoop architecture , 2014, 2014 11th Annual High Capacity Optical Networks and Emerging/Enabling Technologies (Photonics for Energy).

[4]  Lili Li,et al.  Research on using memcached in call center , 2011, Proceedings of 2011 International Conference on Computer Science and Network Technology.

[5]  Ying Bai Practical Database Programming with Java , 2011 .

[6]  Christopher D Wickens,et al.  Processing Resources in Attention, Dual Task Performance, and Workload Assessment. , 1981 .

[7]  Timothy M. D. Ebbels,et al.  Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA , 2011 .

[8]  A. Kala Karun,et al.  A review on hadoop — HDFS infrastructure extensions , 2013, 2013 IEEE CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES.

[9]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[10]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[11]  Gang Chen,et al.  Adaptive Logging for Distributed In-memory Databases , 2015, ArXiv.

[12]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[13]  Surajit Chaudhuri,et al.  An overview of business intelligence technology , 2011, Commun. ACM.

[14]  Liping Di,et al.  Creating web service interfaces and scientific workflows using command line tools: A GRASS example , 2009, 2009 17th International Conference on Geoinformatics.

[15]  Ying Bai JDBC API and JDBC Drivers , 2011 .

[16]  M. Maurya,et al.  Performance analysis of MapReduce programs on Hadoop cluster , 2012, 2012 World Congress on Information and Communication Technologies.