Meteorological Data Analysis Using MapReduce

In the atmospheric science, the scale of meteorological data is massive and growing rapidly. K-means is a fast and available cluster algorithm which has been used in many fields. However, for the large-scale meteorological data, the traditional K-means algorithm is not capable enough to satisfy the actual application needs efficiently. This paper proposes an improved MK-means algorithm (MK-means) based on MapReduce according to characteristics of large meteorological datasets. The experimental results show that MK-means has more computing ability and scalability.

[1]  Amit Goyal,et al.  A Survey on Cloud Computing , 2009 .

[2]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[3]  Naga K. Govindaraju,et al.  Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Poulami Dalapati A Survey on Cloud Computing , 2013 .

[6]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[7]  Xindong Wu,et al.  K-Means Clustering with Bagging and MapReduce , 2011, 2011 44th Hawaii International Conference on System Sciences.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[10]  Julie K. Lundquist,et al.  Data Clustering Reveals Climate Impacts on Local Wind Phenomena , 2012 .

[11]  J. Overpeck,et al.  Climate Data Challenges in the 21st Century , 2011, Science.

[12]  Joos-Hendrik Böse,et al.  Beyond online aggregation: parallel and incremental data mining with online Map-Reduce , 2010, MDAC '10.

[13]  Neil Genzlinger A. and Q , 2006 .

[14]  Qing He,et al.  Parallel K-Means Clustering Based on MapReduce , 2009, CloudCom.

[15]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[16]  Yan Yang,et al.  A Parallel Cop-Kmeans Clustering Algorithm Based on MapReduce Framework , 2011 .

[17]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[18]  Benjamin Moseley,et al.  Fast clustering using MapReduce , 2011, KDD.

[19]  Ravi Kumar,et al.  Max-cover in map-reduce , 2010, WWW '10.

[20]  Chen Li,et al.  Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.

[21]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[22]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..