Abstract
Remote sensing images often suffer from the degradation effects of atmospheric haze, which can significantly impair the quality and utility of the acquired data. A novel dehazing method leveraging generative adversarial networks is proposed to address this challenge. It integrates a generator network, designed to enhance the clarity and detail of hazy images, with a discriminator network that distinguishes between dehazed and real clear images. Initially, a dense residual block is designed to extract primary features. Subsequently, a wavelet transform block is designed to capture high and low-frequency features. Additionally, a global and local attention block is proposed to reduce the interference of redundant features and enhance the weight of important features. PixelShuffle is used as the upsampling operation, allowing for finer control of image details during the upsampling process. Finally, these designed modules are integrated to construct the generator network for image dehazing. Moreover, an improved discriminator network is proposed by adding a noise module to the conventional discriminator, enhancing the network’s robustness. A novel loss function is introduced by incorporating the color loss function and SSIM loss function into traditional loss functions, aiming to improve color accuracy and visual distortion assessment. This approach attains the highest PSNR and SSIM scores when compared to current leading methods. The proposed dehazing technique for remote sensing images successfully maintains color fidelity and detail, leading to significantly clearer images.
Similar content being viewed by others
Introduction
Remote sensing technology has become essential in various fields, such as environmental monitoring, urban planning, agriculture, and disaster management1. However, atmospheric conditions often degrade the quality of remote sensing images, with haze being a major factor in this deterioration. Haze diminishes the clarity and visibility of objects within images, affecting the precision of vision-based tasks like object detection, classification, and segmentation.
Recently, significant advancements have been made in image haze removal technology, resulting in various proposed methods2,3. These dehazing techniques can be generally divided into two main categories: prior-based and learning-based methods. The prior-based methods, rely on the systematic generalization of the intrinsic statistical laws between blurred and clear images by human beings. Then, the dehazing algorithm is constructed based on these laws. The second category, learning-based methods, involves directly or indirectly establishing mapping relationships from extensive datasets of hazy and clear images. Prior-based methods leverage prior knowledge of clean images to estimate the trans-mission map and global atmospheric light. These approaches often utilize atmospheric scattering models and manually crafted priors. He et al. developed a single-image dehazing algorithm utilizing the dark-channel prior4, which includes four steps: calculating the dark channel, estimating the atmospheric light, recovering the transmittance, and dehazing. The algorithm utilizes dark channel information to assess the transmittance of the image, which the haze model then inverts to obtain a clear image. However, noise may be misidentified in low-light conditions as dark channel pixels, affecting the dehazing effect. Bi et al. introduced a novel dehazing algorithm for single remote sensing images, leveraging a low-rank and sparse prior5. This approach decomposes the dark channel of a hazy image into two components by using the atmospheric scattering model. The two components are low-rank atmospheric veil and a sparse dark channel representing direct attenuation. The method accurately estimates the atmospheric veil layer through the PCP-ADMM algorithm and adaptive thresholding method. It uses adaptive bootstrap filtering for refinement, ultimately recovering a clear haze-free image. Ning et al. proposed a dehazed method based on a robust light-dark prior6. The method first re-moves haze using a robust dark-channel prior, followed by a light channel prior to eliminating shadows, and introduces a cube root mean enhancement-based stable state search criterion to optimize the choice of patch size. Despite the progress made in haze removal, the method has limited effectiveness in dealing with dense haze situations. Prior-based approaches estimate the transmission map and global atmospheric light based on predefined assumptions and then use an atmospheric scattering model to recover clear images. While these methods have demonstrated effective haze removal capabilities, their performance in real-world conditions remains uncertain. The optimal selection of prior knowledge, alignment with image statistics, and impact on dehazing efficacy are still subjects of ongoing research and debate7.
With the advancement of deep learning, numerous methods based on learning have emerged and been applied extensively to image enhancement and dehazing8. Jiang et al. proposed KFA-Net9, a remote sensing image dehazing network under non-uniform hazy weather. This network performs the image dehazing task by fusing the channel attention and pixel attention mechanisms, effectively recovering the details of images under thin hazy conditions. However, its performance is slightly degraded when facing thick hazy scenes, resulting in possible blurring of image texture details. Du et al. proposed an end-to-end asymmetric U-net dehazing network10, which realizes the joint optimization of physical parameters. They designed an attention mechanism that significantly improved dehazing performance by jointly optimizing the physical parameters in the network. Bie et al. proposed the Gaussian and physics-guided dehazing network11. The network utilizes the global attention mechanism to extract features from different haze distributions and uses the Gaussian process in the intermediate latent space to learn the full labeled dehazing. While this method effectively restores clear images, the outcomes are significantly influenced by manual parameter settings. Sun et al. proposed a remote sensing image dehazing network that is named partial Siamese with multiscale bi-codec networks (PSMB). It can extract image features at different scales. It excels in handling large-scale complex backgrounds and detail-rich remote sensing images12. Li et al. proposed a physically aware dehazing network based on residual learning and atmospheric scattering model13. The network utilizes multi-scale gating convolution with a haze extraction unit for dehazing. However, the network is limited by the training dataset.
Recent advancements in deep learning have introduced powerful tools for image processing tasks, with generative adversarial networks (GANs) emerging as a particularly promising approach. GANs are composed of two neural networks, a generator and a discriminator, which are trained concurrently using adversarial learning. This setup has demonstrated remarkable success in generating high-quality images and has been applied to tasks such as image super-resolution, style transfer, and image enhacement14,15. The application of GANs has shown superior performance in tasks such as remote sensing image and medical image enhancement16,17. Researchers have also utilized GANs to address the challenge of remote sensing image dehazing. Akhtar et al. proposed a novel single-image dehazing approach employing a patch GAN model built on the U-Net architecture18. This network removes haze by combining VGG19, ResNet, EfficientNet and modified MobileNet, though color distortion remains an issue in dehazed images. Tan et al. proposed GAN-UD, which consists of a frequency-spatial attention generator and a discriminator19. The network can remove thin clouds, but a small amount remains as the thickness increases. Xu et al. developed an enhanced CycleGAN network integrating an adaptive dark channel prior20. The network initially employs the Wave-Vit semantic segmentation model to adapt the dark channel prior (DCP) for recovering transmittance and atmospheric light. Then, the physically derived scattering coefficients are used to optimize the hazing process. Finally, they designed an enhanced CycleGAN architecture by using the atmospheric scattering model. The method can remove haze, but the physical model’s limitations may lead to localized over-enhancements. Zhang et al. proposed a guided generative adversarial dehazing network21. The network adds a guided module on the generator to recover image details. However, the discriminator is biased when judging special images, such as white-based images and dim images. Men et al. introduced DedustGAN, an unpaired learning algorithm for image dedusting based on GANs and Retinex22. It utilized GANs and Retinex to achieve initial physically-based image dedusting results through a learning approach. It achieved unpaired learning to avoid the limitation of paired learning. Zhong et al. introduced a dehazing method for remote sensing images using Cy-cleGAN23. The network captures image changes under different meteorological conditions through multi-temporal resources, improving the adaptability of the dehazing model to non-uniform haze situations. However, dehazing images are overexposed when there is too much background light. Sun et al. introduced an approach involving adaptive fine grained channel attention and unsupervised bidirectional contrastive reconstruction networks24. This method removes haze through an unsupervised bidirectional contrastive reconstruction framework and an adaptive fine-grained channel attention mechanism. However, the parameter estimation network still faces challenges in accurately removing haze.
Despite the success of GANs in image dehazing, the dehazed images often suffer from significant loss of detail information and color distortion, significantly impacting overall image quality. To enhance the restoration of detailed information in dehazed remote sensing images and reduce color loss, we propose a dense residual wavelet dehazing network based on a GANs for remote sensing images. Our approach leverages the strengths of GANs while incorporating several innovative components designed to enhance dehazing performance. Specifically, we introduce dense residual blocks for robust feature extraction, wavelet transform blocks to capture both high and low-frequency features, and global and local attention mechanisms to emphasize important features and reduce redundancy. Additionally, we employ PixelShuffle for upsampling to achieve finer control over image details, and we enhance the discriminator network with a noise module to improve its robustness against various distortions. Furthermore, we introduce a color loss function and a structural similarity index (SSIM) loss function to the traditional loss functions to measure color fidelity and structural similarity better. Our extensive experiments show that our proposed method surpasses state-of-the-art techniques in both peak signal to noise ratio (PSNR) and SSIM, effectively maintaining color and detail information in dehazed images.
The structure of this paper is as follows: section I introduces the current remote sensing image dehazing methods. Section II describes the architecture and components of the proposed wavelet-based GANs. Section III presents our experimental findings and compares them with existing methodologies. Lastly, section IV concludes the paper and explores potential future research directions.
Method
The Wavelet-based GANs method for remote sensing image dehazing consists of a generative network, an adversarial network, and loss functions. Next, we will provide a detailed introduction to the generative network, adversarial network, and loss functions.
Architecture of proposed generative network
To mitigate the impact of haze on the quality of remote sensing images, we have designed a wavelet-based GANs for dehazing, as shown in Fig. 1. It mainly consists of dense residual blocks, wavelet transform blocks, and global and local attention blocks. It includes an encoder part and a decoder part.
In the encoder, we initiate the feature extraction process for the hazy remote sensing image by using a 3 \(\times\) 3 convolution layer with batch normalization and a LeakyReLU activation function. The output feature map consists of 32 channels. Secondly, we use three hybrid modules with the same architecture to extract features from the output feature map further. The channels for the output feature map of three hybrid modules are 64, 128, and 256, respectively. The hybrid module comprises a parallel module composed of a dense residual block, and wavelet transform block, a global and local attention block, and a downsampling block. The dense residual block is used to extract features from the spatial domain of the image. The wavelet transform block is used to extract features from the wavelet domain of the image. The dense residual block and wavelet transform block have the same input feature map. We employ concatenation to merge the output feature maps from the dense residual block and the wavelet transform block. This fused feature map serves as the input for the global and local attention block. The global and local attention block is employed to enhance the network’s emphasis on crucial features and mitigate the interference of redundant features. Downsampling is employed to decrease the size of the output feature map of the global and local attention block. Towards the end of the encoder, we incorporate a residual module to deepen the network and extract more complex feature information. The residual mapping consists of three dense residual blocks. The output feature map from the encoder serves as the input for the decoder. In the decoder, three cascaded hybrid modules with identical architecture are employed to restore the resolution of the feature maps to the original input image size while extracting additional features. The hybrid module consists of a PixelShuffle operation, a dense residual block, a wavelet transform block and a global and local attention block. We employ PixelShuffle as the upsampling operation. PixelShuffle achieves upsampling by rearranging the pixels of the feature maps, allowing for finer control of image details during the upsampling process. Compared to conventional upsampling operations, PixelShuffle can better preserve image details and texture information, thereby achieving higher-quality reconstruction. The wavelet transform block is additionally utilized to extract features from the wavelet domain of the image. The input feature map of the wavelet transform block in each module of the decoder corresponds to the input feature map of the corresponding wavelet transform module in the encoder.
The dense residual block is also used to extract features from the spatial domain of the image. The output feature map is the input feature map of the dense residual block. After merging the output feature maps of the dense residual block and the wavelet trans-form block using the concatenation operation, the resulting fused feature map is utilized as the input feature map of the global and local attention block. To better preserve shallow feature information, we utilize concatenation to fuse the output features of the global–local attention module with the corresponding downsampled features generated in the encoder. At the decoder’s end, a 3 \(\times\) 3 convolution layer paired with a Tanh activation function is utilized to reconstruct the remote sensing image from the extracted features.
Architecture of proposed dense residual block
The dense residual block utilized in the generative network is illustrated in Fig. 2. It can be seen as consisting of four parts. The first part consists of a residual block. The second and third parts each consist of a 1 \(\times\) 1 convolution, a batch normalization (BN) layer, a LeakyReLU activation function, and a residual module. The fourth part comprises a 1 \(\times\) 1 convolution layer, followed by a batch normalization (BN) layer, and a LeakyReLU activation function. In the first three parts, we introduce the skip connection from any parts to all subsequent parts to further improve the information flow between parts. Concatenation operations are employed to merge the feature maps from different parts, and a 1 \(\times\) 1 convolution is used to decrease the number of channels, thereby facilitating feature fusion. The integration of the LeakyReLU function and batch normalization (BN) enhances the network’s training stability while improving its ability to capture nonlinear representations.
The residual block is mainly responsible for extracting features. It consists of three feature extraction modules with the same architecture. Each feature extraction module has a 3 \(\times\) 3 convolution layer and a dilated convolution layer, followed by BN and a LeakyReLU function. This strengthens the nonlinear processing capability and improves the model’s learning ability for complex textures while avoiding overfitting and gradient vanishing problems. We employ three dilated convolutions in the residual block, with dilation rates set to 1, 2, and 3, respectively. Dilated convolution aims to expand receptive fields and capture additional lower-frequency information. The second feature extraction module’s output is fused with the first module’s output to retain critical information. Then, the output of the third module is fused with the original input information to preserve the initial information. Finally, a 1 \(\times\) 1 convolution is used to compress the channels and integrate features to enhance the learning of key information further.
Architecture of proposed wavelet transform block
We propose the wavelet transform block in the generative network to extract different frequency features. The proposed wavelet transform block is shown in Fig. 3. It utilizes wavelet transform to decompose the input features into four scale features: LL, LH, HH, and HL. They are expressed as:
where \(\downarrow _2\) denotes a downsampling operation that takes one element from every two elements. The \(*\) denotes convolution operation. The T denotes the transpose operation. The L and H represent the low-pass and high-pass filters in wavelet Transform, respectively. The represents the input features map. The LL in (1) contains low-frequency information that includes the contours of the image and the overall luminance information. The HL, LH, and HH contain high-frequency information. HL is used to capture high-frequency information in the horizontal direction. LH is used to capture high-frequency information in the vertical direction. HH is used to capture high-frequency information in the diagonal direction. We use concatenation to fuse the HL, LH, and HH. Ultimately, we use a 1 \(\times\) 1 convolution and linear interpolation to extract features and restore the feature map size. The proposed wavelet transform block has two output feature maps. We use the low-frequency features as the output features of the wavelet transform block in the encoder network and the high-frequency features as the output features of the wavelet transform block in the decoder network in Fig. 1.
Architecture of proposed global and local attention block
To enhance the network’s emphasis on crucial features and mitigate the interference of redundant features, we design the global and local attention block shown in Fig. 4. It consists of a global attention block and a local attention block. Global attention is used to capture the overall contextual information, thus enabling the network to learn image features on a broader scale. The local attention mechanism focuses on details such as edges and textures in images, serving as a complement to global attention. The global attention block adopts a residual architecture. The residual mapping consists of a global average pooling, a 1 \(\times\) 1 convolution, a LeakyReLU function, and a sigmoid function. The local attention block also adopts a residual architecture. The residual mapping consists of a local average pooling, a 1 \(\times\) 1 convolution, a 3 \(\times\) 3 convolution, a LeakyReLU function, a 1 \(\times\) 1 convolution, and a sigmoid function. Global and local average pooling are mainly used to extract global and local features, respectively. The Sigmoid function generates the feature map weight from the extracted features. In the end, the output features of the global attention module and local attention module are fused to synthesize global context and detail information. Finally, we utilize a 3 \(\times\) 3 convolution to integrate global and local information. The global and local attention block ensures that the final reconstructed image retains important global information as well as restores local details, improving the quality of dehazed images.
Architecture of proposed adversarial network
The GANs consists of a generative network and an adversarial network. The generative network is designed to remove haze from remote sensing images. In contrast, the adversarial network is responsible for determining whether a given remote sensing image is a dehazed image or the original clear image. The adversarial network plays a crucial role in enhancing the performance of the generative network. The proposed adversarial network is shown in Fig. 5. It consists of a noise module, four convolutional blocks, and a 1 \(\times\) 1 convolution. Four convolutional blocks have the same architecture and different channel numbers. The input image is first noised to enhance the robustness of the network to random perturbations in the input data. Each convolution block includes a 3 \(\times\) 3 convolution, a BN block, and a LeakyReLU activation function. The channels of the output feature map for the four convolutional blocks are 32, 64, 128, and 256, respectively. In the end, a 1 \(\times\) 1 convolution is employed to combine the features and determine whether the input image is generated or real.
Improved loss function
To better measure the generative and adversarial network’s performance, we proposed an improved loss function for our GANs by introducing the LAB color space loss and SSIM loss into the conventional loss function. The complete loss function is expressed as:
where \(L_{ad}\) is the adversarial loss. \(L_{per}\) is the perceptual loss. \(L_{col}\) is the LAB color. \(L_{SSIM}\) is the SSIM loss. The perceptual loss function is commonly used to measure the perceptual differences between the generated and target images. Unlike traditional pixel-level losses, perceptual loss leverages the feature extraction layers of a pre-trained convolutional neural network (e.g., VGG network) to compute the differences in feature space between images. The adversarial loss is expressed as follows:
where D( ) and G( ) represent the adversarial network and generative network, respectively. x and z represent the haze-free and hazy images, respectively. The perceptual loss is expressed as follows:
where \(\phi _i\) represents the feature map extracted from the i-th layer of the convolutional neural network \(\phi\).
LAB loss function is a perceptual loss function that operates in the LAB color space, designed to measure the perceptual difference between images. It leverages the LAB color space’s perceptually uniform properties to better align with human visual perception compared to RGB space. The LAB color loss is expressed as follows:
where L, A and B are the three color channels of the LAB color space, respectively. \(lab\_img1_c\) and \(b\_img2_c\) are the values of the corresponding image channels in LAB color space. SSIM loss is a perceptual loss function commonly used to evaluate the similarity between two images. It considers both the images’ luminance, contrast and structural similarity, aiming to mimic human visual perception more closely than traditional pixel-wise loss functions like mean squared error (MSE) or mean absolute error (MAE). The SSIM loss is expressed as follows:
where \(\mu _x\) and \(\mu _y\) are the mean values of x and y, \(\sigma _x ^2\) and \(\sigma _y ^2\) is the variance of x and y, and \(\sigma _{xy}\) is the covariance of x and y.
Simulation and discussion
The SateHaze1K dataset25 and RICEx dataset are utilized as the dataset for training and evaluating the dehazing performance of the proposed algorithms. The StateHaze1k dataset has three subsets representing three distinct levels of haze concentration: the StateHaze1k thin dataset, the StateHaze1k moderate dataset, and the StateHaze1k thick dataset. Each subsets contains 400 paired images. The RICEx dataset contains 400 clear images selected from the RICE226 and downloaded from Copernicus Data Space Ecosystem27 based on Sentinel Hub’s EO Browser. We use the atmospheric scattering model to add haze to the clear images and obtain 400 paired images with different spatial resolutions. The images in RICE2 are derived from the Landsat and images downloaded from Copernicus Data Space Ecosystem are derived from the Sentinel-2. We select 300 paired images from each subset in the StateHaze1k dataset and 300 paired images from the RICEx dataset as the training images, and the others as the test images. The PSNR and SSIM are used to compare our method quantitatively. In our experiment, the network undergoes training for 200 epochs. During the first 100 epochs, a learning rate of 0.0002 is used, which is then linearly decreased to 0 during the next 100 epochs. We utilize the Adam optimizer to optimize the network and its parameters. The \(\beta _1\) and \(\beta _2\) are set to 0.9 and 0.999, respectively. The Ubuntu 18.04 system is utilized in our trial. PyTorch is the deep learning framework utilized, with the NVIDIA GeForce GTX 1080ti serving as the GPU.
Subjective analysis of dehazing performance
We randomly selected three low-density haze images from the StateHaze1k thin test images to test our proposed methods, EPDN method8, PSMB method12, MSGAN method16, DedustGAN method22, and DW-GAN28. The hazy, dehazed, haze-free, and locally magnified images are shown in Fig. 6. Based on the first image, it is clear that the image produced by EPND has lower brightness than the haze-free image. Within the images that have been enlarged at a local level, it can be seen that the image generated by MSGAN exhibits significant distortion (marked by the red circles). The images generated by DedustGAN, DW-GAN, PSMB, and our proposed method do not show obvious distortions. The locally magnified section of the second image shows that the images generated by EPND and MSGAN contain more residual haze. The images produced by DedustGAN and DW-GAN exhibit local color distortion (marked by the red rectangles). Although the images generated by our method and PSMB method also exhibit some distortion (marked by the red rectangles), the level of distortion is pretty little and more similar to the image without haze. From the third image, the EPND image exhibits noticeably reduced brightness in comparison to the haze-free remote sensing image. The output images generated by MSGAN and DW-GAN exhibit a subtle manifestation of haze, leading to a decrease in the overall quality of the image. Within the images that have been enlarged at a local level, it can be observed that the EPND and MSGAN-generated images exhibit noticeable detail distortion, which is indicated by the presence of red rectangles. The images produced by other methods do not exhibit significant distortion.
In order to evaluate the algorithm’s capacity to remove haze from remote sensing images with varying levels of haze density, we chose three remote sensing images with a moderate haze density in a random manner from the StateHaze1k moderate dataset as the test images. The dehazed, hazy, and haze-free images are shown in Fig. 7. From the locally magnified section of the first image, the images produced by EPND, MSGAN, and DW-GAN exhibit significant color distortion (marked by red rectangles) and minor detail distortion (marked by red circles). The images produced by DedustGAN and PSMB exhibit noticeable color aberration, as indicated by the red rectangles. From the locally magnified section of the second image, it is apparent that the image generated by EPND exhibits significant color distortion, as indicated by red rectangles. The images generated by DedustGAN and DW-GAN have minor color distortion, as indicated by red rectangles. The images generated by MSGAN and DW-GAN contain residual haze. The images produced by our proposed approach and the PSMB method do not exhibit significant distortion and are more similar to the haze-free image. From the locally magnified section of the third image, it is evident that the images produced by the EPND, MSGAN, and DedustGAN techniques exhibit significant color and detail distortion. Although the images generated by our proposed method, PSMB, and DW-GAN approach also exhibit noticeable color distortion, the distortion degree is relatively lower than the other methods. The images generated by our method and DW-GAN are closer to the third haze-free image.
To evaluate the algorithm’s performance using remote sensing images with high densities of haze, we selected three hazy images with high density in a random manner from StateHaze1k thick dataset as test images. Figure 8 displays the images with haze, the images after dehazing, and the images without haze. From the locally magnified section of the first image, the images produced by MSGAN, DedustGAN and DW-GAN clearly display noticeable color aberration (marked by red circles). Additionally, the images generated by MSGAN and DW-GAN contain some residual haze. The images produced by EPND and PSMB exhibit minor color distortion (marked by red circles). The image generated by our proposed algorithm does not exhibit noticeable distortion. From the locally magnified section of the second image, the image produced by the EPND algorithm suffers from detail distortion (marked by red circles). The image generated by MSGAN and DW-GAN contains unremoved haze (marked by red rectangles), affecting image clarity. The image produced by DedustGAN exhibits minor color distortion. The image the PSMB algorithm produces has minor detail distortion (marked by red rectangles). From the locally magnified section of the third image, it can be observed that the images generated by other methods exhibit significant detail distortion. The image generated by our method and DedustGAN have relatively less distortion. In summary, the images generated by our proposed algorithm better restore color and details, resulting in clearer images.
To evaluate the algorithm’s performance for remote sensing images with different spatial resolutions, we selected two RICEx test images as the input images. The results are shown in Fig. 9. The first image is captured by Landsat and the second by Sentinel-2. From the locally magnified section of the first image, the images generated by EPDN and MSGAN exhibit noticeable haze residue, resulting in low image clarity. The images generated by EPDN and DedustGAN exhibit noticeable color distortions. The image generated by the PSMB method suffers from underexposure. The images generated by DW-GAN and DedustGAN have excessively high contrast, with colors in certain areas appearing too vibrant. From the locally magnified section of the second image, haze residue is in the image generated by EPDN and MSGAN. The image generated by DedustGAN has excessive brightness. The image generated by the PSMB method still suffers from underexposure. Although the image generated by DW-GAN appears more apparent than others, the image generated by our method is closer to haze-free images.
Objective analysis of dehazing performance
To objectively analyze the dehazing performance of the algorithm, we processed all test images in the StateHaze1k thin, moderate, thick and RICEx datasets, respectively. We evaluated the quality of the dehazed images using PSNR and SSIM metrics. The performance results are shown in Table 1. In the StateHaze1k thin dataset, the average PSNR values for dehazed images obtained by EPDN, MSGAN, DedustGAN, DW-GAN, PSMB, and our proposed method are 24.161, 24.254, 26.103, 26.354, 26.711, and 28.204, respectively. The average SSIM values for these methods are 0.901, 0.907, 0.903, 0.908, 0.914, and 0.927, respectively. In the StateHaze1k moderate dataset, the average PSNR values for dehazed images obtained by EPDN, MSGAN, DedustGAN, DW-GAN, PSMB, and our proposed method are 25.735, 26.583, 27.473, 28.916, 28.121, and 30.643, respectively. The average SSIM values for these methods are 0.897, 0.909, 0.913, 0.932, 0.934, and 0.955, respectively. In the StateHaze1k thick dataset, the average PSNR values for dehazed images obtained by EPDN, MSGAN, DedustGAN, DW-GAN, PSMB, and our proposed method are 24.515, 24.921, 25.266, 26.929, 26.543, and 28.541, respectively. The average SSIM values for these methods are 0.903, 0.912, 0.894, 0.922, 0.920, and 0.929, respectively. In the RICEx dataset, the average PSNR values for dehazed images obtained by EPDN, MSGAN, DedustGAN, DW-GAN, PSMB, and our proposed method are 29.251, 29.894, 32.741, 33.103, 33.554, and 35.212, respectively. The average SSIM values for these methods are 0.951, 0.953, 0.962, 0.968, 0.957, and 0.982, respectively. In the StateHaze1k thin dataset, our method has higher PSNR and SSIM values, followed by PSMB and DW-GAN. In the StateHaze1k moderate dataset, our method has higher PSNR value, followed by DW-GAN and PSMB. Our method has higher SSIM value, followed by PSMB and DW-GAN. In the StateHaze1k thick dataset, Our method has higher PSNR and SSIM values, followed by DW-GAN and PSMB. In the RICEx dataset, Our method has higher PSNR and SSIM values, followed by PSMB and DW-GAN. In summary, the dehazed images generated by our proposed algorithm consistently exhibit the highest PSNR and SSIM values across all datasets with varying haze densities and different spatial resolutions. This indicates that our method has superior dehazing capabilities for remote sensing images compared to other methods. We also test the average inference times for one \(512\times 512\) image. The results are shown in Table 2. Our model has a longer inference time than others, mainly due to its complexity. Although our model is relatively complex, it demonstrates strong dehazing capability for remote sensing images.
Ablation experiments
To assess the effectiveness of each proposed module, the ablation experiment is given by removing each proposed module from our complete method. Four experiments were performed on the StateHaze1k thin dataset. The “no_DRB” method is obtained by replacing our proposed dense residual block with conventional convolution modules in the complete method. The “no_WTB” and “no_GALB” methods are obtained by removing the proposed wavelet transform block and proposed attention blocks in our complete method, respectively. The “no_LS” method is obtained by removing the SSIM loss and color loss from the complete loss function in our complete method. Table 3 displays the evaluation results for each module. The “no_DRB” method produces the lowest PSNR and SSIM values,while the “no_WTB” approach comes next in terms of performance. The “no_LS” algorithm produces the greatest PSNR and SSIM values, while the “no_GALB” approach comes in second. Lower PSNR and SSIM values indicate a more significant performance degradation when a module is removed from the complete algorithm. This implies that the removed module is crucial to the algorithm’s performance. Based on this analysis, the proposed dense residual block contributes the most to enhancing the algorithm’s performance, followed by the proposed wavelet transform block and the global and local attention block. The improved loss function has the smallest impact on the algorithm’s performance improvement.
Conclusion
This paper introduces a new wavelet-based GANs designed specifically for dehazing remote sensing image. Our network integrates several key components: dense residual blocks for primary feature extraction, and wavelet transform blocks for capturing high and low-frequency features, and global and local attention blocks to emphasize significant features while reducing redundant ones. Additionally, we utilized PixelShuffle for upsampling to allow finer control over image details and introduced SSIM and color loss functions to enhance the perceptual quality of the dehazed images. Our method demonstrated superior performance in objective and subjective evaluations through extensive experiments and comparative analysis. The results on the StateHaze1k thin, moderate, thick datasets, and RICEx showed that our method consistently demonstrated the highest PSNR and SSIM values, signifying substantial enhancements compared to current state-of-the-art methods. The objective metrics were corroborated by subjective visual inspections, where our dehazed images exhibited better clarity, color fidelity, and detail preservation. The ablation studies further confirmed the effectiveness of each component in our network. The dense residual block contributed the most to performance improvement, followed by the wavelet transform block and global and local attention blocks. Including SSIM and color loss functions also provided noticeable enhancements, though to a lesser extent. In summary, our wavelet-based GANs offers a robust solution for remote sensing image dehazing, addressing common challenges such as loss of details, haze residue, and color distortion. Our proposed method improves the visual quality of the images and enhances their usability for various remote sensing applications.
Although our model achieves better dehazing results, its relative complexity leads to longer dehazing times. Therefore, in the future, we will consider ways to reduce the complexity of the dense residual modules to strike a balance between dehazing performance and processing time. Additionally, our model is only suitable for conventional remote sensing images and has limited generalization capability. In complex remote sensing images, such as those containing significant cloud cover, snow, or large reflective surfaces, these elements are often misclassified as haze, resulting in severe distortion in the dehazed images. To address this issue, we will explore ways to improve dehazing for remote sensing images in complex environments.
Data availability
The SateHaze1K dataset can be obtained through the website https://aistudio.baidu.com/projectdetail/4385363. The RICE2 dataset can be obtained through the website https://github.com/BUPTLdy/RICE_DATASET. The images downloaded from Copernicus Data Space Ecosystem can be obtained through the website https://dataspace.copernicus.eu/explore-data/data-collections/sentinel-data based on Sentinel Hub’s EO Browser. Based on Sentinel Hub’s EO Browser, users can visualise, compare and analyse, and download all this data for a variety of applications, from environmental monitoring and disaster management to urban planning and agriculture. You can access the Browser without downloading at: https://dataspace.copernicus.eu/browser/. The version of Sentinel Hub’s EO Browser is v1.13.7. There are more introductions about the Browser at https://documentation.dataspace.copernicus.eu/Applications/Browser.html.
References
Mello, F. A. O., Dematte, J. A. M., Bellinaso, H., Poppiel, R. & Rizzo, R. Remote sensing imagery detects hydromorphic soils hidden under agriculture system. Sci. Rep. 13, 10897. https://doi.org/10.1038/s41598-023-36219-9 (2023).
Shen, H., Zhong, T., Jia, Y. F. & Wu, C. M. Remote sensing image dehazing using generative adversarial network with texture and color space enhancement. Sci. Rep. 14, 12382. https://doi.org/10.1038/s41598-024-63259-6 (2024).
Jin, J. & Yan, H. Remote sensing image dehazing algorithm based on wavelet coefficient weighting. In IOP Conference Series: Earth and Environmental Science, vol. 384. https://doi.org/10.1088/1755-1315/384/1/012159 (2019).
He, K., Sun, J. & Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 33, 2341–2353. https://doi.org/10.1109/TPAMI.2010.168 (2010).
Bi, G., Si, G. & Zhao, Y. Haze removal for a single remote sensing image using low-rank and sparse prior. IEEE Trans. Geosci. Remote Sens. 60, 1–13. https://doi.org/10.1109/TGRS.2021.3135975 (2021).
Ning, J., Zhou, Y., Liao, X. & Duo, B. Single remote sensing image dehazing using robust light-dark prior. Remote Sens. 15, 938. https://doi.org/10.3390/rs15040938 (2023).
Liang, S., Gao, T., Chen, T. & Cheng, P. A remote sensing image dehazing method based on heterogeneous priors. IEEE Trans. Geosci. Remote Sens. 62, 1–13. https://doi.org/10.1109/TGRS.2024.3379744 (2024).
Qu, Y., Chen, Y. & Huang, J. Enhanced pix2pix dehazing network. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8160–8168. https://doi.org/10.1109/CVPR.2019.00835 (2019).
Jiang, B., Wang, J. & Wu, Y. A dehazing method for remote sensing image under nonuniform hazy weather based on deep learning network. IEEE Trans. Geosci. Remote Sens. 61, 1–17. https://doi.org/10.1109/TGRS.2023.3261545 (2023).
Du, Y. et al. Dehazing network: Asymmetric unet based on physical model. IEEE Trans. Geosci. Remote Sens. 62, 1–12. https://doi.org/10.1109/TGRS.2024.3359217 (2024).
Bie, Y., Yang, S. & Huang, Y. Single remote sensing image dehazing using Gaussian and physics-guided process. IEEE Geo-sci. Remote Sens. Lett. 19, 1–5. https://doi.org/10.1109/LGRS.2022.3177257 (2022).
Sun, H., Luo, Z. & Du, R. Partial siamese with multiscale bi-codec networks for remote sensing image haze removal. IEEE Trans. Geosci. Remote Sens. 61, 1–16. https://doi.org/10.1109/TGRS.2023.3321307 (2023).
Li, Z., He, J. & Yuan, Q. Phdnet: A novel physic-aware dehazing network for remote sensing images. Inf. Fusion 106, 102277. https://doi.org/10.1016/j.inffus.2024.102277 (2024).
Zhang, D., Tang, N. & Qu, Y. Joint motion deblurring and super-resolution for single image using diffusion model and gan. IEEE Signal Process. Lett. 31, 736–740. https://doi.org/10.1109/LSP.2024.3370491 (2024).
Jia, Y., Yu, W. & Chen, G. Nighttime road scene image enhancement based on cycle-consistent generative adversarial network. Sci. Rep. 14, 14375. https://doi.org/10.1038/s41598-024-65270-3 (2024).
Xu, Z., Wu, K. & Li, H. Cloudy image arithmetic: A cloudy scene synthesis paradigm with an application to deep-learning-based thin cloud removal. IEEE Trans. Geosci. Remote Sens. 60, 1–16. https://doi.org/10.1109/TGRS.2021.3122253 (2022).
Xu, Z., Wu, K. & Huang, L. A review of generative adversarial networks (GANs) and its applications in a wide variety of disciplines: From medical to remote sensing. IEEE Access 12, 18330–18357. https://doi.org/10.1109/ACCESS.2023.3346273 (2024).
Akhtar, M. S., Ali, A. & Chaudhuri, S. S. Mobile-UNet GAN: A single-image dehazing model. Signal. Image Video Process. 18, 275–283. https://doi.org/10.1007/s11760-023-02752-3 (2024).
Tan, Z. C., Du, X. F. & Man, W. Unsupervised remote sensing image thin cloud removal method based on contrastive learning. IET Image Process. 18, 1844–1861. https://doi.org/10.1049/ipr2.13067 (2024).
Xu, Y., Zhang, H., He, F., Guo, J. & Wang, Z. Enhanced cycleGAN network with adaptive dark channel prior for unpaired single-image dehazing. Entropy 25, 856. https://doi.org/10.3390/e25060856 (2023).
Zhang, J., Dong, Q. & Song, W. GGADN: Guided generative adversarial dehazing network. Soft Comput. 27, 1731–1741. https://doi.org/10.1007/s00500-021-06049-w (2023).
Meng, X., Huang, J. & Li, Z. DedustGAN: Unpaired learning for image dedusting based on retinex with GANs. Expert Syst. Appl. 243, 122844. https://doi.org/10.1016/j.eswa.2023.122844 (2024).
Zhong, M., X., W. & Wang, J. A remote sensing image defogging method based on improved cyclegan network. IEEE International Conference on Computer Vision, Image and Deep Learning (CVIDL), 113–116. https://doi.org/10.1109/CVIDL58838.2023.10166126 (2024).
Sun, H., Wen, Y. & Feng, H. Unsupervised bidirectional contrastive reconstruction and adaptive fine-grained channel attention networks for image dehazing. Neural Netw. 176, 106314. https://doi.org/10.1016/j.neunet.2024.106314 (2024).
Huang, B., Zhi, L. & Yang, C. Single satellite optical imagery dehazing using SAR image prior based on conditional generative adversarial networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1806–1813. https://doi.org/10.1109/WACV45572.2020.9093471 (2020).
Lin, D., Xu, G., Wang, X. & Wang, Y. e. a. A remote sensing image dataset for cloud removal. arXiv:1901.00600 (2019).
Ecosystem, C. D. S. Copernicus Data Space Ecosystem. https://dataspace.copernicus.eu/explore-data/data-collections/sentinel-data (2023).
Fu, M., Liu, H., Yu, Y., Chen, J. & Wang, K. DW-GAN: A discrete wavelet transform GAN for nonhomogeneous dehazing. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 203–212. https://doi.org/10.1109/CVPRW53098.2021.00029 (2021).
Acknowledgements
The work is supported by the Research Foundation of Education Bureau of Jilin Province (JJKH20210042KJ, JJKH20220054KJ, JJKH20240084KJ, JJKH20240085KJ), Technology Development Program of Jilin Province (20210203169SF)
Author information
Authors and Affiliations
Contributions
G.C. and Y.J. were responsible for experimental conceptualization and design, and were the main contributors to writing the manuscript; Y.Y. verified the experimental design; S.F., D.L., and T.W., analyzed and explained the experimental data. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chen, G., Jia, Y., Yin, Y. et al. Remote sensing image dehazing using a wavelet-based generative adversarial networks. Sci Rep 15, 3634 (2025). https://doi.org/10.1038/s41598-025-87240-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-87240-z