Study of Hadoop-MapReduce on Google N-Gram Datasets
暂无分享,去创建一个
In previous decades, there has been a significant paradigm shift in the domain of computer architecture and processing mechanisms of large-scale data due to the increase of computational power caused by an overwhelming flow of massive amount of data. Hadoop and MapReduce are very powerful concepts which enable the efficient development of scalable and parallel applications required for processing vast amounts of data. In this paper, we investigate the concept of Hadoop and MapReduce and eventually use the programming tool of MapReduce and Apache Pig to solve existing computation problems of very complex and complicated Google Ngrams datasets.
[1] Sherif Sakr,et al. The family of mapreduce and large-scale data processing systems , 2013, CSUR.
[2] Christos Doulkeridis,et al. A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.