论文信息 - Getting Started with Hadoop

Getting Started with Hadoop

Apache Hadoop is a software framework that allows distributed processing of large datasets across clusters of computers using simple programming constructs/models. It is designed to scale-up from a single server to thousands of nodes. It is designed to detect failures at the application level rather than rely on hardware for high-availability thereby delivering a highly available service on top of cluster of commodity hardware nodes each of which is prone to failures [2]. While Hadoop can be run on a single machine the true power of Hadoop is realized in its ability to scale-up to thousands of computers, each with several processor cores. It also distributes large amounts of work across the clusters efficiently [1].

K. G. Srinivasa | Anil Kumar Muppalla | K. Srinivasa

[1] Konstantin V. Shvachko,et al. HDFS Scalability: The Limits to Growth , 2010, login Usenix Mag..

[2] Michael J. Cafarella,et al. Building Nutch: Open Source Search , 2004, ACM Queue.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[5] Lars George,et al. HBase: The Definitive Guide , 2011 .

[6] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[7] Mahadev Konar,et al. ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[8] Michael Hausenblas,et al. Apache Drill: Interactive Ad-Hoc Analysis at Scale , 2013, Big Data.

[9] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[10] Michael Stonebraker,et al. MapReduce: A major step backwards , 2014 .

[11] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[12] GhemawatSanjay,et al. The Google file system , 2003 .

[13] Christopher Chute,et al. The Diverse and Exploding Digital Universe , 2011 .