Despite the promising preliminary results, existing graph convolutional network (GCN) based multi-view learning methods directly use the graph structure as view descriptor, which may inhibit the ability of multi-view learning for multimedia data. The major reason is that, in real multimedia applications, the graph structure may contain outliers. Moreover, they fail to take advantage of the information embedded in the inaccurate clustering labels obtained from their proposed methods, resulting in inferior clustering results. These observations motivate us to study whether there is a better alternative GCN based framework for multi-view clustering. To this end, in this paper, we propose an end-to-end self-supervised graph convolutional network for multi-view clustering (SGCMC). Specifically, SGCMC constructs a new view descriptor for graph-structured data by mapping the raw node content into the complex space via Euler transformation, which not only suppresses outliers but also reveals non-linear patterns embedded in data. Meanwhile, the proposed SGCMC uses the clustering labels to guide the learning of the latent representation and coefficient matrix, and the latter in turn is used to conduct the subsequent node clustering. By this way, clustering and representation learning are seamlessly connected, with the aim to achieve better clustering results. Extensive experiments indicate that the proposed SGCMC outperforms the state-of-the-art methods.