Towards Evaluation of Multi-party Dialogue Systems