论文信息 - The k-Nearest Neighbor Algorithm Using MapReduce Paradigm

The k-Nearest Neighbor Algorithm Using MapReduce Paradigm

Data in any form is a valuable resource but more often than not data collected in the real world is completely random and unstructured. Hence, to utilize the true potential of data as a resource we must transform it in such a manner so as to retrieve meaningful information from it. Data mining fulfills this need. Today there is not only a need for efficient data mining techniques to process large volume of data but also a need for a means to meet the computational requirements to process such huge volume of data. In this paper we implement an effective data mining technique known as the k-Nearest Neighbor method on a distributed computing environment running Apache Hadoop that uses the MapReduce paradigm to process high volume data.

Kaushik Roy | Prajesh P. Anchalia | Kaushik Roy | P. Anchalia

[1] Tom White,et al. Hadoop: The Definitive Guide , 2009 .

[2] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[3] Benjamin Reed,et al. The life and times of a zookeeper , 2009, PODC '09.

[4] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[5] Anjan K. Koundinya,et al. MapReduce Design of K-Means Clustering Algorithm , 2013, 2013 International Conference on Information Science and Applications (ICISA).

[6] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[7] Christopher Olston,et al. Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience , 2009, Proc. VLDB Endow..

[8] GhemawatSanjay,et al. The Google file system , 2003 .