Multi-agent Q-learning for autonomous D2D communication