A New Data Mining Algorithm based on MapReduce and Hadoop

The goal of data mining is to discover hidden useful information in large databases. Mining frequent patterns from transaction databases is an important problem in data mining. As the database size increases, the computation time and required memory also increase. Base on this, we use the MapReduce programming mode which has parallel processing ability to analysis the large-scale network. All the experiments were taken under hadoop, deployed on a cluster which consists of commodity servers. Through empirical evaluations in various simulation conditions, the proposed algorithms are shown to deliver excellent performance with respect to scalability and execution time.

[1]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[2]  Christian Böhm,et al.  Multidimensional Index Structures in Relational Databases , 2000, Journal of Intelligent Information Systems.

[3]  Shi Zhongzhi,et al.  An Efficient Data Mining Framework on Hadoop using Java Persistence API , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.

[4]  Arlo Faria,et al.  MapReduce : Distributed Computing for Machine Learning , 2006 .

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Hua Xiang,et al.  Research on Framework of Private Cloud in Visual Simulation , 2011 .

[7]  Craig MacDonald,et al.  On single-pass indexing with MapReduce , 2009, SIGIR.

[8]  Yu Hai-quan Aviation Simulation Architecture Based on Cloud Computing Platform , 2011 .

[9]  Ralf Lämmel,et al.  Google's MapReduce programming model - Revisited , 2007, Sci. Comput. Program..

[10]  Henri Casanova,et al.  Practical divisible load scheduling on grid platforms with APST-DV , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[11]  Nitesh V. Chawla,et al.  Scaling up Classifiers to Cloud Computers , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Wang Xiao-hua,et al.  Networked Modeling & Simulation Platform Based on Concept of Cloud Computing—Cloud Simulation Platform , 2009 .

[13]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[16]  Carl Pomerance,et al.  A Tale of Two Sieves , 1998 .