On Convergence of Average-Reward Off-Policy Control Algorithms in Weakly-Communicating MDPs