Recently, deep learning models have shown convincing performance in removing a single satellite image haze, which arouses increasing attention in the field of remote sensing (RS). Unfortunately, these models still suffer from an insufficient ability to recover the desired fine spatial details from the hazy image. In this letter, we first attempt to explore an end-to-end hybrid high-resolution learning network framework termed H2RL-Net to address this issue due to its novel feature extraction architecture, where spatially precise outputs are guaranteed by the main high-resolution branch and semantically richer features are collected by the complementary set of multiresolution convolution streams. To improve representation learning, H2RL-Net is constructed primarily by exploiting the parallel cross-scale fusion (PCF) module, thereby increasingly aggregating information from the multiple scales at the respective resolution level, which allows both top-down and bottom-up information exchanging processes. Simultaneously, we also introduce the channel feature refinement (CFR) block to our model, aiming to perform dynamic feature recalibration among the channelwise features and produce better dehazed results. The experimental analysis illustrates that the designed framework can deliver significant improvements over other baseline methods in the synthetic and real-world hazy RS images under various scenes.