Permitindo Maior Reprodutibilidade de Experimentos em Ambientes Distribuídos com Nodos de Baixa Confiabilidade

Experiment reproducibility, essential for the verification of effectiveness/efficiency of scientific contributions, is particularly challenging in the context of large-scale distributed systems. Non-programmed failures (either at nodes that compose the system, or in the communication between them) may make it difficult for one to achieve statistical significance in the results, or to verify their validity. To address this problem, we propose EASYEXP, a fault-tolerant architecture to ensure the reproducibility of experiments in non-reliable distributed testbeds. In EASYEXP, nodes in the experiment environment “interpret” workers and execute actions that are expected for them, following a predefined schedule. In the event of failure of a node, it is replaced by another functional one, keeping the execution context of the worker interpreted by it. Results obtained show that EASYEXP is able to maintain a lower variation (standard deviation of 1.6 %) and higher precision (95.7 %) among multiple runs of the same experiment, when compared to those performed in a traditional way (25% deviation and 72% accuracy only).