Article in Press G Model Sustainable Computing: Informatics and Systems Improving Resource Efficiency in Data Centers Using Reputation-based Resource Selection

Nowadays, data centers are consuming a lot of energy but not in an efficient fashion. Much of energy is wasted. There are several types of energy waste at different levels including infrastructure-, machine- and system-level waste. The former two levels have been improved significantly in the last few years, however, few efforts have been put on the last level, especially the resource waste caused by failures in a data center. In this paper, we attack the problem proactively by leveraging a reputation-based resource selection scheme to reduce the number of resubmissions of tasks, resulting from the failure during the course of their execution. To capture the characteristics of resources, we introduce Opera, an OPEn ReputAtion model. Opera characterizes itself with two important novelties. One is that Opera employs a vector, instead of a single value, to represent the reputation of a resource (entity) in order to capture its heterogeneity in different points of view. The other novelty of Opera is introducing the just-in-time feature that represents the real-time system status, which, to our knowledge, has never been considered in conventional reputation systems. To demonstrate the effectiveness of Opera, we have integrated and implemented the Opera trust model with the scheduler in Hadoop, a popular, data intensive open source framework. The experimental results showed that Opera enabled the scheduler to select appropriate nodes to assign tasks based on different criteria, and this helped reduce not only the number of re-executed tasks but also the execution time of Hadoop's jobs under the presence of failures and heavy workload up to 59% and 32%, respectively.

[1]  M. Kistler THE CASE FOR POWER MAN AGEMENT IN WEB SERVERS , 2001 .

[2]  Weisong Shi,et al.  Analysis of ratings on trust inference in open environments , 2008, Perform. Evaluation.

[3]  Anand Sivasubramaniam,et al.  Filtering failure logs for a BlueGene/L prototype , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[4]  Evangelos Kotsovinos,et al.  Replic8: Location-Aware Data Replication for High Availability in Ubiquitous Environments , 2005, WWIC.

[5]  Michael Kistler,et al.  The case for power management in web servers , 2002 .

[6]  David Abramson,et al.  Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid , 2000, Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region.

[7]  Zhiling Lan,et al.  Exploit failure prediction for adaptive fault-tolerance in cluster computing , 2006, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06).

[8]  Rudolf Eigenmann,et al.  Prediction of Resource Availability in Fine-Grained Cycle Sharing Systems Empirical Evaluation , 2007, Journal of Grid Computing.

[9]  Ian T. Foster,et al.  Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.

[10]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[11]  Subhash Saini,et al.  GridFlow: workflow management for grid computing , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[12]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Jon B. Weissman,et al.  A quantitative comparison of reputation systems in the grid , 2005, The 6th IEEE/ACM International Workshop on Grid Computing, 2005..

[15]  Hung-Yu Wei,et al.  Interference-aware IEEE 802.16 WiMax mesh networks , 2005, 2005 IEEE 61st Vehicular Technology Conference.

[16]  Abhishek Chandra,et al.  Adaptive Reputation-Based Scheduling on Unreliable Distributed Infrastructures , 2007, IEEE Transactions on Parallel and Distributed Systems.

[17]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[18]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[19]  Weisong Shi,et al.  Failure-aware workflow scheduling in cluster environments , 2010, Cluster Computing.

[20]  GhemawatSanjay,et al.  The Google file system , 2003 .

[21]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[22]  Weisong Shi,et al.  A reputation-driven scheduler for autonomic and sustainable resource sharing in Grid computing , 2010, J. Parallel Distributed Comput..

[23]  Audun Jøsang,et al.  AIS Electronic Library (AISeL) , 2017 .

[24]  A. Jøsang,et al.  Filtering Out Unfair Ratings in Bayesian Reputation Systems , 2004 .

[25]  Richard E. Brown,et al.  Report to Congress on Server and Data Center Energy Efficiency: Public Law 109-431 , 2008 .

[26]  Christian Belady,et al.  GREEN GRID DATA CENTER POWER EFFICIENCY METRICS: PUE AND DCIE , 2008 .

[27]  Weisong Shi,et al.  Improving resource efficiency in data centers using reputation-based resource selection , 2012, Sustain. Comput. Informatics Syst..

[28]  Audun Jsang,et al.  Analysing topologies of transitive trust , 2003 .

[29]  Salim Hariri,et al.  Autonomic Computing: An Overview , 2004, UPP.

[30]  Muthucumaru Maheswaran,et al.  Evolving and managing trust in grid computing systems , 2002, IEEE CCECE2002. Canadian Conference on Electrical and Computer Engineering. Conference Proceedings (Cat. No.02CH37373).

[31]  Liang Liu,et al.  GreenCloud: a new architecture for green data center , 2009, ICAC-INDST '09.

[32]  Hector Garcia-Molina,et al.  The Eigentrust algorithm for reputation management in P2P networks , 2003, WWW '03.

[33]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[34]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[35]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.