Big Data Operations: Basis for Benchmarking a Data Grid

Data Operations over the wide area network are very complex. The end-to-end implementations vary significantly in their efficiency, failure recovery and transactional management. Benchmarking for these operations is vital as we go forward given the exponential growth in data size. The critical evaluation of the types of data operations performed within large-scale data management systems and the comparison of the efficiency of the operations across implementations is an appropriate topic for benchmarking in a big data framework. In this paper, we identify the various operations that are important in large-scale data management and discuss a few of these in terms of data grid benchmarking. These operations form a set of core abstractions that can define interactions with big data systems by domain-centric scientific or business workflow applications. We chose these operational abstractions from our experience in dealing with large-scale distributed systems and with data-intensive computation.