Study of Hadoop-MapReduce on Google N-Gram Datasets

In previous decades, there has been a significant paradigm shift in the domain of computer architecture and processing mechanisms of large-scale data due to the increase of computational power caused by an overwhelming flow of massive amount of data. Hadoop and MapReduce are very powerful concepts which enable the efficient development of scalable and parallel applications required for processing vast amounts of data. In this paper, we investigate the concept of Hadoop and MapReduce and eventually use the programming tool of MapReduce and Apache Pig to solve existing computation problems of very complex and complicated Google Ngrams datasets.