论文信息 - A Survey of Big Data Processing in Perspective of Hadoop and Mapreduce

A Survey of Big Data Processing in Perspective of Hadoop and Mapreduce

Big Data is a data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it. Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful for analytics purposes. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. Hadoop MapReduce is an implementation of the algorithm developed and maintained by the Apache Hadoop project. Map Reduce is a programming model for processing large data sets with parallel distributed algorithm on cluster. This paper presents the survey of bigdata processing in perspective of hadoop and mapreduce.

D. Usha

[1] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[2] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3] Xindong Wu,et al. Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4] Michael Stonebraker,et al. MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[5] Wilson C. Hsieh,et al. Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[6] Sanjay Ghemawat,et al. MapReduce: a flexible data processing tool , 2010, CACM.

[7] Michael Stonebraker,et al. MapReduce: A major step backwards , 2014 .