We develop a deep learning framework based on deep image prior (DIP) and attention networks for 3-D seismic data enhancement. First, the 3-D noisy data are divided into several overlapped patches. Second, the DIP network has a U-NET architecture, where the input patches are encoded to extract the significant latent features, while the decoder tries to reconstruct the input patches using these extracted features. Besides, the attention network is used to scale the extracted features from the encoder and the decoder. Third, the attention network output of the encoder is concatenated with that of the decoder to obtain high-order features and guide the network to extract the most significant information related to the seismic signals and discard the others. Finally, the 3-D seismic data are reconstructed using the output patches obtained by the DIP network. The proposed algorithm is an iterative and unsupervised approach, which does not require labeled data. We evaluate the proposed algorithm using several synthetic and field data examples. As a result, the proposed algorithm shows the ability to enhance the 3-D seismic data by attenuating the random noise and preserving the 3-D seismic signal with minimal signal leakage. Moreover, the proposed algorithm shows good denoising performance when tested using various types of events, e.g., linear, hyperbolic, low and high dominant frequencies, and weak amplitude. In addition, the proposed method outperforms the predictive filtering (PF) and damped rank-reduction (DRR) methods. To further understand the principle of the proposed method inside the DIP network, we analyze the weighting matrices in the encoder and decoder parts in detail. We attribute the denoising ability of the DIP network to the improvement of the extracted basis features from the encoder to the decoder layers through a deep network.