RABIT : A Reliable Allreduce and Broadcast Interface
暂无分享,去创建一个
Allreduce is an abstraction commonly used for solving machine learning problems. It is an operation where every node starts with a local value and ends up with an aggregate global result. MPI provides an Allreduce implementation. Though it has been widely adopted, it is somewhat limited; it lacks fault tolerance and cannot run easily on existing systems. In this work, we propose RABIT1, an Allreduce library suitable for distributed machine learning algorithms that overcomes the aforementioned drawbacks; it is faulttolerant and can easily run on top of existing systems. We compare RABIT with existing solutions and show that it performs competitively.
[1] Zhaohui Zheng,et al. Stochastic gradient boosted distributed decision trees , 2009, CIKM.
[2] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[3] LangfordJohn,et al. A reliable effective terascale linear learning system , 2014 .