Hadoop is currently the large-scale data analysis “hammer” of choice, but there exist classes of algorithms that aren’t “nails” in the sense that they are not particularly amenable to the MapReduce programming model. To address this, researchers have proposed MapReduce extensions or alternative programming models in which these algorithms can be elegantly expressed. Th is article espouses a very diff erent position: that MapReduce is “good enough,” and that instead of trying to invent screwdrivers, we should simply get rid of everything that’s not a nail. To be more specifi c, much discussion in the literature surrounds the fact that iterative algorithms are a poor fi t for MapReduce. Th e simple solution is to fi nd alternative, noniterative algorithms that solve the same problem. Th is article captures my personal experiences as an academic researcher as well as a soft ware engineer in a “real-world” production analytics environment. From this combined perspective, I refl ect on the current state and future of “big data” research.
[1]
Jeffrey D. Ullman,et al.
Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation
,
2012,
ArXiv.
[2]
Pramod Bhatotia,et al.
Large-scale Incremental Data Processing with Change Propagation
,
2011,
HotCloud.
[3]
Abraham Silberschatz,et al.
Efficient processing of data warehousing queries in a split execution environment
,
2011,
SIGMOD '11.
[4]
Dominic Battré,et al.
Nephele/PACTs: a programming model and execution framework for web-scale analytical processing
,
2010,
SoCC '10.
[5]
Abraham Silberschatz,et al.
HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads
,
2009,
Proc. VLDB Endow..