Since CapsNet [1] shattered all previous records of algorithms for image recognition, the capsule's conception has attracted bright attention. It interprets an object by the geometrical arrangement of parts. We think it can be transferred to hyperspectral images. In a hyperspectral data cube, each pixel spectrum can be regarded as a continuous curve representing its inherent properties. In the spatial domain, there are various spatial distributions in different positionsand there is usually a specific structural relationship between adjacently distributed categories. Based on HSI data's aforementioned structural characteristics, combined with the stacked capsule autoencoder, we propose our model to achieve an unsupervised HSI classification. In our model, the ConvLSTM is employed to discover part capsules of HSI, and we utilize Set Transformer to encode relations among all parts and indicate object capsules. The decoders of both phases use Gaussian mixture models to reconstruct specific information. Experimental results of the Pavia Center dataset show the exceptional of our model.