论文信息 - Big Data Processing: Application of Parallel Processing Technique to Big Data by Using MapReduce

Big Data Processing: Application of Parallel Processing Technique to Big Data by Using MapReduce

This chapter is a description of MapReduce, which serves as a programming algorithm for distributed computing in a parallel manner on huge chunks of data that can easily execute on commodity servers thus reducing the costs for server maintenance and removal of requirement of having dedicated servers towards for running these processes. This chapter is all about the various approaches towards MapReduce programming model and how to use it in an efficient manner for scalable text-based analysis in various domains like machine learning, data analytics, and data science. Hence, it deals with various approaches of using MapReduce in these fields and how to apply various techniques of MapReduce in these fields effectively and fitting the MapReduce programming model into any text mining application. Big Data Processing: Application of Parallel Processing Technique to Big Data by Using MapReduce

[1] Ronald C. Taylor. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics , 2010, BMC Bioinformatics.

[2] Fernando Pedone,et al. Database replication using generalized snapshot isolation , 2005, 24th IEEE Symposium on Reliable Distributed Systems (SRDS'05).

[3] Michael C. Schatz,et al. CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[4] Fernando Pedone,et al. Tashkent: uniting durability with transaction ordering for high-performance scalable database replication , 2006, EuroSys.

[5] N. Metropolis,et al. Massively parallel processing , 1986 .

[6] Heiko Paulheim,et al. A Hybrid Multi-strategy Recommender System Using Linked Open Data , 2014, SemWebEval@ESWC.

[7] Duy-Dinh Le,et al. Integrating Spatial Information into Inverted Index for Large-Scale Image Retrieval , 2014, 2014 IEEE International Symposium on Multimedia.

[8] John W. Tukey,et al. Exploratory Data Analysis. , 1979 .

[9] Anurag Gupta,et al. Amazon Redshift and the Case for Simpler Data Warehouses , 2015, SIGMOD Conference.

[10] Hellen Adams,et al. Patent and Trademark Office , 2017 .

[11] Irving L. Traiger,et al. The notions of consistency and predicate locks in a database system , 1976, CACM.

[12] Jordán Pascual Espada,et al. Machine learning approach for text and document mining , 2014, ArXiv.