In this paper, we propose a Structure-adaptive Neighborhood Preserving Hashing (SNPH) method for unsupervised scalable video search. Unlike most existing hashing methods which equally encode an entire video into a binary feature vector, we propose a neighborhood attention mechanism which encodes the neighborhood-relevant content of a video to better preserve the neighborhood relationships among videos. Motivated by the fact that a video usually contains multiple shots and each shot depicts a different activity, we further develop a structure-adaptive encoder to model the hierarchical structure of the video. Specifically, the encoder adaptively divides each video into multiple segments via detecting temporal boundaries across frames and encodes these segments as a compact binary vector to capture rich structural information. We integrate the neighborhood attention mechanism into the structure-adaptive encoder to learn hash functions that jointly preserve the neighborhood relationships among videos and exploit the hierarchical structure in a video. Experimental results on three widely used benchmark datasets show that our proposed method consistently outperforms state-of-the-art unsupervised video hashing methods.