An Overlay-Based Data Mining Architecture Tolerant to Physical Network Disruptions

Management scheme for highly scalable big data mining has not been well studied in spite of the fact that big data mining provides many valuable and important information for us. An overlay-based parallel data mining architecture, which executes fully distributed data management and processing by employing the overlay network, can achieve high scalability. However, the overlay-based parallel mining architecture is not capable of providing data mining services in case of the physical network disruption that is caused by router/communication line breakdowns because numerous nodes are removed from the overlay network. To cope with this issue, this paper proposes an overlay network construction scheme based on node location in physical network, and a distributed task allocation scheme using overlay network technology. The numerical analysis indicates that the proposed schemes considerably outperform the conventional schemes in terms of service availability against physical network disruption.

[1]  Yu-Chang Chao,et al.  Load Rebalancing for Distributed File Systems in Clouds , 2013, IEEE Transactions on Parallel and Distributed Systems.

[2]  Abhishek Chandra,et al.  Exploiting Spatio-Temporal Tradeoffs for Energy-Aware MapReduce in the Cloud , 2012, IEEE Transactions on Computers.

[3]  Amiya Nayak,et al.  Enhancing peer-to-peer systems through redundancy , 2007, IEEE Journal on Selected Areas in Communications.

[4]  Niklas Elmqvist,et al.  Ubiquitous Analytics: Interacting with Big Data Anywhere, Anytime , 2013, Computer.

[5]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[6]  Nei Kato,et al.  THUP: A P2P Network Robust to Churn and DoS Attack Based on Bimodal Degree Distribution , 2013, IEEE Journal on Selected Areas in Communications.

[7]  Randy H. Katz,et al.  How Hadoop Clusters Break , 2013, IEEE Software.

[8]  Qin Zheng Improving MapReduce fault tolerance in the cloud , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[9]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[10]  Abhishek Chandra,et al.  Exploiting Spatio-temporal Tradeoffs for Energy-Aware MapReduce in the Cloud , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[11]  Xian-He Sun,et al.  ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.

[12]  Sean Quinlan,et al.  GFS: Evolution on Fast-forward , 2009, ACM Queue.

[13]  Wenjun Xiao,et al.  CayleyCCC: A Robust P2P Overlay Network with Simple Routing and Small-World Features , 2011, J. Networks.

[14]  Nei Kato,et al.  Designing P2P Networks Tolerant to Attacks and Faults Based on Bimodal Degree Distribution , 2012, J. Commun..

[15]  H. Peter Hofstee,et al.  Big Data text-oriented benchmark creation for Hadoop , 2013, IBM J. Res. Dev..

[16]  Roy H. Campbell,et al.  Orchestrating an Ensemble of MapReduce Jobs for Minimizing Their Makespan , 2013, IEEE Transactions on Dependable and Secure Computing.

[17]  Masato Asahara,et al.  LoadAtomizer: A locality and I/O load aware task scheduler for MapReduce , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[18]  GhemawatSanjay,et al.  The Google file system , 2003 .

[19]  Odej Kao,et al.  Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud , 2011, IEEE Transactions on Parallel and Distributed Systems.

[20]  Xindong Wu,et al.  A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services , 2012, 2012 ACM/IEEE 13th International Conference on Grid Computing.

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  W. Jencks Evolution on fast-forward , 1992, Nature.

[23]  S. Havlin,et al.  Optimization of network robustness to waves of targeted and random attacks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  Farag Azzedin Towards a scalable HDFS architecture , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).