We describe an unsupervised domain adaptation framework for images by a transform to an abstract intermediate domain and ensemble classifiers seeking a consensus. The intermediate domain can be thought as a latent domain where both the source and target domains can be transferred easily. The proposed framework aligns both domains to the intermediate domain, which greatly improves the adaptation performance when the source and target domains are notably dissimilar. In addition, we propose an ensemble model trained by confusing multiple classifiers and letting them make a consensus alternately to enhance the adaptation performance for ambiguous samples. To estimate the hidden intermediate domain and the unknown labels of the target domain simultaneously, we develop a training algorithm using a double-structured architecture. We validate the proposed framework in hard adaptation scenarios with real-world datasets from simple synthetic domains to complex real-world domains. The proposed algorithm outperforms the previous state-of-the-art algorithms on various environments.