Multiview angle UAV infrared image simulation with segmented model and object detection for traffic surveillance

Aibibu, Tuerniyazi; Lan, Jinhui; Zeng, Yiliang; Hu, Jinghao; Yong, Zhuo

doi:10.1038/s41598-025-89585-x

Download PDF

Article
Open access
Published: 12 February 2025

Multiview angle UAV infrared image simulation with segmented model and object detection for traffic surveillance

Tuerniyazi Aibibu^1,2^na1,
Jinhui Lan^1,3^na1,
Yiliang Zeng^1,3,
Jinghao Hu^1,3 &
…
Zhuo Yong^1,3

Scientific Reports volume 15, Article number: 5254 (2025) Cite this article

1433 Accesses
Metrics details

Subjects

Abstract

With the rapid development of infrared (IR) imaging UAV technology, infrared aerial image processing technology has been applied in different fields. But it is not very convenient to obtain real aerial images in some cases because of flight limitations, acquisition costs and other factors. So, it is necessary to simulate UAV infrared images by computer. This paper proposed an improved infrared aerial image simulation method based on open source AirSim. By improving the original AirSim infrared image simulation method, the simulation quality of the infrared image is improved via 3-dimensional segmented model processing. The infrared aerial images of the traffic scene with different viewing angles are simulated via the proposed method in this paper and we constructed infrared traffic scene simulation dataset (IR-TSS) containing seven types of objects. We propose the efficient EfficientNCSP-Net net for the IR-TSS dataset and use popular methods for comparative experiments. The experimental results show that the proposed EfficientNCSP-Net has an mAP₅₀ greater than 96% for object detection on IR-TSS dataset, which is better than those of the existing methods. This paper not only contributes to research on infrared image simulations of traffic scenes, but also has referential significance in other aerial image simulation fields.

AerialIRGAN: unpaired aerial visible-to-infrared image translation with dual-encoder structure

Article Open access 27 September 2024

HIT-UAV: A high-altitude infrared thermal dataset for Unmanned Aerial Vehicle-based object detection

Article Open access 20 April 2023

Multiview stereo reconstruction of UAV remote sensing images based on adaptive propagation with multiregional refinement

Article Open access 01 April 2025

Introduction

With the rapid development of integrated circuit technology and artificial intelligence technology, infrared imaging technology has played a great role in different application fields, such as traffic^1,2, rescue^3,4, security^5,6, agriculture^7,8 and environmental protection^9,10. In particularly, the combination of drone technology and infrared imaging technology has extended their applications to more fields. Consequently, UAV infrared imaging technology is favored by many researchers in the field of traffic monitoring because of its flexibility and strong anti-interference ability. Many scientific studies have been carried out on the problem of infrared image object detection in traffic scenarios, but real UAVs are limited by flight restrictions in certain traffic scenarios, while frequent flights increase the cost of technical research, which makes data collection difficult for technical researchers. Therefore, it is necessary to study simulation methods for UAV infrared images in traffic scenarios. In this way, aerial infrared image data are collected via simulation, which can reduce the cost of data collection and improve the efficiency of technical research and development. A schematic diagram of the significance of the IR image simulation is shown in Fig. 1.

According to Fig. 1, UAV infrared image simulation technology not only has application value in the field of traffic monitoring but also has high potential in the fields of national defense and civil applications, such as wildlife protection, search, rescue, and vehicle detection. Currently, many researchers in different fields have carried out research on infrared image simulations and obtained certain results. Current infrared image simulation methods can be summarized as follows. The first is based on professional commercial software for infrared image simulation, such as SE-Workbench-IR¹¹, JRM¹², and Presagis Vega Prime¹³. Commercial software has a better simulation effect, but the cost is high. The high cost limits its use, and people without solid financial support cannot access professional infrared simulation software. The second method is to simulate infrared scenes through deep learning generative adversarial networks (GANs)¹⁴. However, these methods require many training samples to complete model training, and obtaining different angles of traffic scene data through the infrared simulation method based on a GAN is inconvenient. The third method is a combination of 3D modeling software and Ansys¹⁵ software for infrared image simulation¹⁶. Ansys is a comprehensive commercial simulation software that can simulate subtle temperature changes on an object’s surface, but it not only needs to spend money to buy a software license but also has less relevance for IR image simulation, which requires many parameters to be set up during the simulation process, thus making the simulation inefficient. The fourth method is UAV-view infrared image simulation based on AirSim¹⁷ or AirGen¹⁸, which uses the Unreal Engine¹⁹ as the basic platform and generates infrared simulation images through the open-source plug-in AirSim from Microsoft Corporation¹⁷. The advantages of this method are that software such as Unreal Engine and AirSim do not require users to purchase their licenses; thus, it can be accessed by many people, so it has potential research value for follow-up research and extension. The disadvantages are that the quality of the UAV viewpoint infrared simulation image based on AirSim is relatively poor, which is suitable for the simulation of infrared images of long-distance and small-scale target scenes, and the simulation results of the IR images for short-distance objects have a certain gap between the simulation results and their real infrared images, which need to be further improved in detail and quality. Considering both the advantages and disadvantages of these methods mentioned above, this paper proposed an improved UAV infrared image simulation method based on AirSim, through which an IR-TSS dataset was developed and corresponding object detection model EfficientNCSP-Net was designed for UAV infrared images. In order to verify the effectiveness of our infrared image simulation algorithm and the proposed object detection algorithm, we carried out comparative experiments and ablation experiments using the IR-TSS dataset, Drone Vehicle dataset, and HIT-UAV dataset. The experimental results show that the images generated by our proposed infrared image simulation method have more object detail information than those generated by the existing AirSim-based infrared image simulation method, and the proposed target detection method achieves a traffic object detection accuracy of 96.2% on the IR-TSS dataset, which is higher than the existing mainstream methods.

This paper is organized that the second section presented the research results of other researchers in infrared image simulation which have been completed in recent years and described their advantages and disadvantages; the third section provided a detailed description of the improved infrared image simulation method based on AirSim; the fourth section introduced the process of creating the IR-TSS dataset and its features; the fifth section presented the object detection model EfficientNCSP-Net for UAV infrared image; the sixth section showed the comparative experiments and the ablation experiment; the seventh section discussed the results of this paper according to the theory of the methods and the experimental results; at the end, we summarized this paper and gave the conclusion.

Related work

The contributions of this paper are as follows. (1) Based on the original AirSim infrared image simulation method, this paper provides an improved method for generating infrared images, which improves the quality of the infrared images and increases the detailed content of the objects in the IR images. (2) We collect simulated traffic scene infrared images and construct an infrared traffic scene simulation dataset (IR-TSS) containing multiple targets, which includes seven different types of traffic scene targets, such as people, bicycles, cars and buses. (3) The model training process of popular algorithms such as YOLOv10 and RT-DETR on IR-TSS is analyzed, and the object detection effects of different algorithms on the IR-TSS dataset are compared. (4) With the improved infrared image simulation method introduced in this paper, it is possible to obtain high-quality infrared simulation images with different viewpoints at a lower cost, which has certain reference significance in simulating infrared images for UAVs and intelligent self-driving fields.

Focusing on different methods, researchers in the field of infrared imaging have carried out many studies on infrared image simulations according to their own needs and laboratory conditions and have achieved relatively good results. For example, in the simulation of infrared images via commercially available software, Yang et al.¹³ used Vega Prime’s Ondulus IR software to simulate infrared images of outdoor scenes. The corresponding infrared image of objects is simulated by physical modeling, but the condition of using this simulation method is to have the Ondulus IR software from Vega Prime, which has a high cost. Yu et al.¹² introduced a method to obtain infrared simulation images of moving ships on the sea surface via JRM simulation software. IR simulation images of complex sea surface objects can be obtained effectively via this method, but the JRM is expensive. A. Glover et al.²⁰ generated infrared images of aircraft via the CounterSim software from Chemring Countermeasures Ltd. Although it is possible to simulate aircraft infrared images better via this method, the commercial CounterSim software is costly, making it more expensive to promote the method. Pszczel et al.²¹ studied a method for simulating infrared images of tropical marine environments via VIRSuite software. The method uses tropical climate information to mathematically model the marine environment and generate infrared simulation images, but it relies on VIRSuite software, which is generally difficult to access. Lin et al.²² presented a method to simulate infrared images of ships on the sea surface via SE-Workbench. This method generates infrared simulation images of a ship by constructing an infrared radiation model of the sea surface and analyzing the simulation results. However, this method relies on SE-Workbench software, and it is impossible to complete related research without a large amount of financial support. In terms of infrared image simulation based on generative adversarial networks, Wang et al.²³ introduced a recurrent adversarial generative network for infrared image simulation of traffic scenes. In this method, a new infrared image is simulated from the input visible and infrared images. Lyu et al.²⁴ proposed an infrared image simulation method with improved adversarial generative networks. In this method, a new adversarial generative network model is constructed to generate infrared simulated images, and it is used for infrared data augmentation. Özkanog˘lu et al.²⁵ illustrated a generative adversarial network model for simulating infrared images based on visible images. This method is used to solve the problem of difficulty in finding infrared image datasets for deep learning tasks. Wu et al.²⁶ introduced an atmospheric infrared image simulation method based on generative adversarial networks. The method simulated atmospheric infrared images in real time through generative adversarial networks based on the physical mechanism of infrared atmospheric transmission. These methods, which are based on generating adversarial networks, yield relatively good-quality simulated infrared images; however, large amounts of data are needed for network training, and it is not easy to obtain simulated infrared images at any viewing angle. In terms of infrared image simulation based on Ansys or other 3D modeling software, Gong et al.¹⁶ introduced a simulation method for aircraft infrared images based on Ansys and 3D Max software²⁷. The method uses 3dmax software to construct a three-dimensional model of the aircraft, and the temperature distribution model of the aircraft is performed through Ansys software so that the IR image of the aircraft is simulated. Huang et al.²⁸ introduced an infrared image simulation method that combines real images with simulated models. In this method, real infrared images are used to build the background of the scene, and the 3D model of the aircraft and the infrared radiation texture image are constructed via Unity 3D software²⁹, thus obtaining the infrared simulation image of the specified scene. Zys´k et al.³⁰ introduced a method for simulating infrared images via SKA RFS software. In this method, the infrared image of a flying object is simulated in SKA RFS software based on the real test infrared images and parameters, and the simulated infrared image is input into the object recognition module to recognize objects. Although these methods are relatively flexible and able to obtain object infrared simulation images from different viewpoints, they require purchasing key 3D modeling or infrared radiation modeling software, and many parameters need to be set up and adjusted, which leads to a relatively high cost and low simulation efficiency. In terms of infrared image simulation based on open-source software such as AirSim and AirGen, Shah et al.¹⁷ introduced an open source unmanned aerial vehicle, the smart driving simulation plugin AirSim, which is based on the Unreal Engine. The plug-in AirSim can simulate the imaging information of various sensors, such as visible light, infrared and LiDAR³¹ for UAVs and intelligent vehicles, but its infrared image simulation function is relatively weak, and the quality of the picture is rather low. W. Jansen et al.³² improved AirSim for industrial applications and constructed Cosys-AirSim. Cosys-AirSim adds 3D models of industrial robots, shelves, etc., based on the original operation mechanism of AirSim, but its infrared image simulation performance is not further improved. Vemprala et al.¹⁸ developed the machine intelligence R&D (research and development) platform Grid. The platform includes the AirGen module, which provides UAV aerial image simulation functions via AirSim. Although AirGen’s visible aerial image simulation functions and LiDAR point cloud image simulation functions have improved, its infrared aerial image simulation functions have not improved, and the quality of the infrared images generated by AirGen is still low. Considering the advantages and disadvantages of several different infrared image simulation methods mentioned above, we find that the infrared aerial image simulation method based on open-source software has obvious potential application value over other methods. First, the simulation cost is low, and there is no need to spend much money buying software for simulation. Second, there is no need to collect a large amount of real infrared imaging data and no need to train deep learning networks, which saves time and manual costs. Third, open-source software such as AirSim can simulate visible light images, LIDAR images and others in addition to infrared images, which is beneficial for carrying out synergistic simulation experiments with multiple sensors. However, the disadvantage of AirSim open-source simulation software is that its simulated infrared images are low quality and lack detailed information from objects. The proposed method constructs 3D model of each object in the scene via segmentation modeling and then imports the model into the Unreal Engine to construct a 3D traffic scene. Furthermore, the gray value of each object in the scene is calculated via the mathematical model of infrared radiation, and certain amount of noise is added to generate the infrared simulation image. Finally, the IR Traffic Scene simulation dataset (IR-TSS) and the object detection method EfficientNCSP-net are constructed. Finally, the IR simulation image object detection results are compared and analyzed via popular object detection models such as YOLOv10³³ and RT-DETR³⁴.

The contributions of this paper are as follows. (1) Based on the original AirSim infrared image simulation method, this paper provides an improved method for generating infrared images, which improves the quality of the infrared images and increases the detailed content of the objects in the IR images. (2) We collect simulated traffic scene infrared images and construct an infrared traffic scene simulation dataset (IR-TSS) containing multiple targets, which includes seven different types of traffic scene targets, such as people, bicycles, cars and buses. (3) The model training process of popular algorithms such as YOLOv10 and RT-DETR on IR-TSS is analyzed, and the object detection effects of different algorithms on the IR-TSS dataset are compared. (4) With the improved infrared image simulation method introduced in this paper, it is possible to obtain high-quality infrared simulation images with different viewpoints at a lower cost, which has certain reference significance in simulating infrared images for UAVs and intelligent self-driving fields.

Methods

This paper presents an improved AirSim-based simulation method for UAV infrared images. Based on the original AirSim infrared image simulation mechanism, we construct a 3D model for the traffic scene via segmentation modeling and then import the 3D model into Unreal Engine to build the required traffic scene. After calculating the grayscale value of the surface of the object via the infrared radiation computation model and adding a certain amount of noise, we capture the simulation infrared image of the traffic scene. A block diagram of this infrared image simulation method is shown in Fig. 2.

According to Fig. 2, the infrared aerial image simulation method proposed in this paper consists of several main parts, such as segmented 3D model construction, building a traffic scene, calculating the infrared radiation value of the object and adding noise.

(1)
Construction of segmented 3D models.

In this method, the traffic scene consists of many 3D models, so 3D modeling of different traffic objects is required for constructing traffic scenes. Since AirSim simulates infrared images using the ID information of objects in the scene, to provide different ID numbers for different regions of objects, this paper segments the objects into different regional modules according to their material and temperature distributions.

The 3D modeling and segmentation process of the traffic scene is performed via 3D Max software. First, according to the requirements of models for building typical traffic scenes, we constructed 3D models of 7 types of objects, such as people, cars, bicycles, motorbikes, vans, buses, and trucks. On this basis, the model is segmented into several major areas based on material and temperature distributions. For example, a car is divided into the engine area, the glass area, the tire area, and the body area; it is divided into the eye, mouth area, the exposed skin area, and the clothing area for a person; and so on. The 3D model of the car with tire segmentation is shown in Fig. 3.

(2)
Construction of the traffic scene.

After seven different segmented 3D models were completed in 3D Max software, the traffic scene was constructed by importing them into Unreal Engine. To import AirSim into Unreal Engine, the AirSim source code and the Unreal Engine project code need to be launched through Microsoft Visual Studio. After launching the Unreal Engine scene project, we can import the segmented 3D model into the traffic scene. The 3D traffic scene constructed in the Unreal Engine is shown in Fig. 4.

(3)
Model of thermal infrared radiation.

To obtain the infrared simulation image of the traffic scene, we need to calculate the infrared radiation of the object’s surface in the traffic scene. According to the infrared imaging mechanism, the thermal infrared radiation emitted by an object is related mainly to the infrared emissivity of the object’s surface and surface temperature. We set the corresponding temperatures and average emissivity for the different segmentation regions of each object in the traffic scene, which are shown in Table 1.

Table 1 Temperature and emissivity settings for traffic objects.

Full size table

According to Table 1, the infrared radiation of traffic objects can be calculated via Planck’s law using the parameters of an object’s surface emissivity and surface temperature. Planck’s law is formulated³⁷ as follows:

$$\:\begin{array}{c}L\left(T,{\varepsilon}_{avg},{R}_{\lambda\:}\right)={\varepsilon}_{avg}{\int\:}_{\lambda\:=8\mu\:m}^{\lambda\:=14\mu\:m}{R}_{\lambda\:}\left(\frac{2h{c}^{2}}{{\lambda\:}^{5}}\frac{1}{\text{exp}\left(\frac{hc}{kT\lambda\:}\right)-1}\right)d\lambda\:\end{array}$$

(1)

where $\:L$ is the infrared radiation, $\:{\varepsilon}_{avg}$ is the average emissivity, $\:T$ is the temperature, $\:\lambda\:$ is the wavelength, $\:{R}_{\lambda\:}$ is the average peak reaction rate of the camera, $\:h$ is Planck’s constant, $\:k$ is Boltzmann’s constant, and $\:c$ is the speed of light.

The infrared radiation of the object surface in the traffic scene is calculated via Planck’s law and normalized to obtain integrated infrared radiation values. This integrated radiation value is used by AirSim to render infrared images of objects in traffic scenes.

(4)
Noise.

By adding noise, the infrared simulation image can be made more like the real infrared image. Therefore, noise is added to the simulation image via AirSim infrared image simulation in this paper. According to Bondi et al.³⁸, infrared images are affected by noise such as Johnson noise, flicker noise, and thermal noise during imaging inside the camera, whereas infrared rays are affected by air humidity, fog, and dust during atmospheric transmission. Hence, we add a certain amount of Gaussian noise and pepper noise into the simulated image to represent the noise in the real infrared image. The comparison results of the simulated infrared images with and without noise are shown in Fig. 5.

According to Fig. 5, the gray values fluctuate between neighboring pixels after noise is added; thus, the simulation results are more similar to the infrared images from real traffic scenes.

IR-TSS dataset

To make the simulated infrared images generated by improved AirSim useful in processing infrared images and designing target detection algorithms, this paper constructs an Infrared Traffic Scene Simulation dataset (IR-TSS) by collecting a large number of simulated images. To ensure the diversity of imaging scales and angles for different traffic targets, the UAV captured simulated image data at 40 m, 70 m, 90 m, and 120 m heights in the Unreal Engine traffic scene at 90° and 60° viewing angles. The results of the 3D visualization of the flight trajectory during the acquisition of simulated infrared images by the UAV are shown in Fig. 6.

We collected more than 5500 infrared simulation images of the traffic scene according to the UAV flight trajectory in Fig. 6, and 3750 valid images were selected from them to construct the IR-TSS dataset. The dataset contains 7 different categories of objects, such as person, cars, vans, motorcycles, bicycles, buses, trucks, and more than 11,000 instances of objects. The distributions of the number and size of object instances for the IR-TSS dataset are shown in Fig. 7.

According to Fig. 7a, Car has the maximum number of instances in this dataset, which is nearly 3000, and Truck has the minimum number of instances, which is more than 600. In Fig. 7b, the horizontal axis represents different categories, and the vertical axis represents the ratio of the pixel area of the object instance to the area of the whole image, which is calculated as follows:

$$\:\begin{array}{c}E=\frac{\left(h\times\:H\right)\times\:\left(w\times\:W\right)}{H\times\:W}\times\:100\end{array}$$

(2)

where E is the percentage occupied by the object in the image, h is the height of the object label, w is the width of the object label, H is the height of the sample image, and W is the width of the sample image.

According to the box plot in Fig. 7b, the object instances area does not occupy more than 10% of the total sample image area, and the fluctuation in the object instance size is relatively significant, which indicates the diversification of instance scales in the dataset.

IR-TSS dataset not only has high application value in the field of traffic monitoring, but also can be applied in object search in urban areas and road object tracking. Meanwhile, it has potential application trends for analysis of distinguishing objects with similar image features such as motorcycle and bicycle, and analyzing the effects of different infrared imaging angles for object detection.

Detection

To check the effectiveness of being detected for different categories of objects in the IR-TSS dataset during the deep learning target detection task, this paper designs a target detection method (EfficientNCSP-net) for the IR-TSS dataset. The object detection results of this algorithm on the IR-TSS dataset are compared with those of popular algorithms such as YOLOv10, YOLOv9, and RT-DETR³⁴ to verify the effectiveness of EfficientNCSP-net. A schematic diagram of the EfficientNCSP-Net network structure is shown in Fig. 8.

The EfficientNCSP-Net presented in this paper consists of three parts: the backbone, neck, and head. In the backbone part of this network, CSPdarknet⁴⁰ is used from YOLOv8. In the neck section, an improved PAFPN⁴¹ structure is used; it imports YOLOv9’s RepNCSPELAN4 structure based on the original PAFPN structure, which guarantees that the gradient is not lost in the process of network training and retains more information about the object’s features. The head part uses Efficient-iou⁴² which is based on the detection head of YOLOv8, and constructs a new loss function, which lays the foundation for further improving the detection accuracy. The results of road object detection for the different methods are presented in the experimental section of this paper.

Experiments

To verify that the method proposed in this paper has several advantages in AirSim-based infrared image simulation and target detection, we carried out a simulation result comparison experiment and a target detection analysis experiment based on traffic scene simulation images.

(1)
Comparative experiment of simulation results.

To prove the advantages of the proposed infrared image simulation method over existing AirSim infrared image simulation methods, comparative experiments are carried out in this paper. The infrared image simulation result of the original AirSim method is compared with the infrared image simulation result of the improved method, and the comparison result is shown in Fig. 9.

According to Fig. 9, the infrared image simulated by the original AirSim method does not have detailed object information, which is not suitable for the infrared simulation of short-distance objects. However, the method proposed in this paper generates infrared images with more information on the object, which has more research and optimization value.

(2)
Comparative object detection experiments on IR-TSS dataset.

To verify the effectiveness of the object detection method EfficientNCSP-Net on the simulation dataset, this paper uses the detection algorithm EfficientNCSP-Net on the simulation dataset IR-TSS for the detection of objects such as person, bicycles, vehicles, etc. The target detection result metrics⁴³ of this algorithm on the IR-TSS dataset are compared with those of popular algorithms such as YOLOv10³³, YOLOv9⁴⁴, and RT-DETR³⁴. The object detection results of different algorithms on the IR-TSS dataset are shown in Table 2 (input image size 640 × 512).

Table 2 Object detection results of different algorithms on the IR-TSS dataset.

Full size table

According to Table 2, our proposed EfficientNCSP-net object detection algorithm on the IR-TSS dataset has the highest target detection mAP_50, with a value of 96.2%, which is greater than those of other algorithms such as YOLOv8 and YOLOv9. Except for van and bus, the AP₅₀ values of our algorithm for different categories are not less than those of other algorithms. The reason why the AP₅₀ values of van and bus categories are not good is that their infrared image features are not distinguishable in the dataset. It also has lower parameter size and latency time. The visualization results of the EfficientNCSP-net algorithm for object detection are shown in Fig. 10.

According to Fig. 10, the EfficientNCSP-net object detection algorithm can detect objects with different viewing angles and scales on the IR-TSS dataset.

(3)
Comparative object detection experiments on DroneVehicle dataset.

To verify the target detection results of the EfficientNCSP-net al.gorithm on real infrared datasets, we carried out comparative experiments of different algorithms on the DroneVehicle dataset⁴⁵. The DroneVehicle dataset is a visible and infrared dual-mode dataset containing five types of targets: cars, vans, buses, trucks and freight cars. Because we only need infrared data in the comparative experiment, while the DroneVehicle dataset contains both visible and infrared data, we removed the visible data from the DroneVehicle dataset and carried out comparative experiments on the rest of the infrared dataset via the EfficientNCSP-net, YOLOv10, yoloV9, yoloV8, and RT-DETR algorithms. The curves of mAP₅₀ during training of different algorithms on the DroneVehicle dataset are shown in Fig. 11.

According to Fig. 11, the EfficientNCSP-net algorithm has the best training process, whereas the RT-DETR algorithm has the lowest mAP₅₀ during the training process. We carried out further validation experiments based on training models and obtained the detection accuracy of different algorithms on the DroneVehicle dataset. The comparative results are shown in Table 3 (input image size 840 × 712).

Table 3 Object detection results of different algorithms on the DroneVehicle dataset.

Full size table

According to Table 3, the mAP₅₀ of the EfficientNCSP-net algorithm on the DroneVehicle dataset is the highest at 83.4%, whereas the mAP₅₀ of the RT-DETR algorithm is the lowest at 77.2%. It can be seen that our proposed method has better traffic target detection efficiency by achieving the highest value of mAP₅₀ with minimum number of parameters and latency time. To view the target detection ability of the EfficientNCSP-net algorithm on the DroneVehicle dataset more directly, we visualized its target detection results, and the visualization results are shown in Fig. 12.

According to Fig. 12, the EfficientNCSP-net algorithm can better detect objects such as cars and buses in real infrared aerial images, but due to the small differences among the features of objects such as vans, trucks, and freight_cars, the process of detecting the objects via our algorithm is challenging.

(4)
Comparative object detection experiments on HIT-UAV dataset.

To verify the effectiveness of our algorithms on real datasets, we carried out comparative experiments on the HIT-UAV dataset⁴⁶. This HIT-UAV dataset includes five types of objects such as person, bicycle, car, other vehicle, and don’t care, and atmospheric noise is contained in the infrared images. Because there is small number of instances, no corresponding object category and fixed object features for the “don’t care” category, so “don’t care” category is eliminated from the dataset in this paper when using the HIT-UAV dataset for the comparative experiments. We carried out comparative experiments with RT-DETR, YOLOv11 and other algorithms on the HIT-UAV dataset, and the results of the experiments are shown in Table 4 (input image size 640 × 512).

Table 4 Object detection results of different algorithms on the HIT-UAV dataset.

Full size table

According to Table 4, it can be seen that the accuracy of the algorithm proposed in this paper reaches 93.9% on the HIT-UAV dataset, which is higher than others and the AP₅₀ values of our algorithm for different categories are no less than those of other algorithms. Meanwhile, the algorithm has an advantage over other algorithms in terms of the parameter size and latency time.

In order to view the target detection performance of the algorithm on the HIT-UAV dataset more obviously, we visualize the object detection results, and the visualization results are shown in Fig. 13.

According to Fig. 13, it can be seen that the algorithm is able to accurately detect different objects on the HIT-UAV dataset and can distinguish different traffic objects.

(5)
Ablation experiments.

To verify the contribution of EfficientNCSP-net’s CSPdarknet, improved PAFPN and Effic-IoU modules to the detection mAP₅₀ improvement, this paper eliminates or replaces the different network modules for ablation experiments. The results of the ablation experiment are shown in Table 5.

Table 5 Ablation experiment results of the EfficientNCSP-net algorithm on the IR-TSS dataset.

Full size table

According to Table 5, the value of mAP₅₀ for the EfficientNCSP-net network is 96.2% when all the modules are working properly. The value of mAP₅₀ decreases to 95.4% when CSPdarknet is removed from the network. The value of mAP₅₀ decreases to 95.9% when the improved PAFPN is removed from the network. The value of mAP₅₀ decreases to 96.0% when the Effic-IoU is removed from the network. The results of this ablation experiment show that, in terms of improvement in detection performance, CSPdarkNet contributes the most, whereas Effic-IoU contributes the least.

Discussion

Based on the theory of the simulation methodology presented in this paper and the results of the experiments, a brief discussion of the difficulties and key points in the research is presented.

(1)
Although the infrared image simulation method proposed in this paper has certain improvement over the original AirSim method, there are some shortcomings compared with real traffic scene infrared images, which need to be improved in future work according to the principles of thermal infrared radiation distribution on the surface of the object and the mechanism of thermal infrared transmission in the atmosphere to optimize the weaknesses of the simulation algorithm. The weaknesses of the algorithm are mainly eliminated by two ways. Firstly, to simulate the process of gradual change for grayscale value of certain object part in infrared image (e.g. tire, glass of a car), it is necessary to calculate the change of thermal radiation in this part and analyze the cause of heat generation, time of heating, and other factors. Secondly, it is necessary to find the relationship of heat radiation between the neighboring parts of the object to avoid the leaping change of the grayscale value of the pixels around the neighboring parts in the infrared simulation image. The enhancement needs to consider the relationship between neighboring parts of the object, which involves factors such as heat generation time, heat conduction, etc. We also need to combine the results of practical experiments to determine the mathematical relationship of different factors for calculating the corresponding grey values in the simulation image. Schematic diagram of comparative analysis for shortcomings of the infrared simulation picture is shown in Fig. 14.

According to Fig. 14, the proposed infrared image simulation method temporarily fails to simulate the gradual change of the grayscale value of certain part of target, and it also lacks information of the grayscale value changes caused by the infrared radiation relationship of different parts of the object. These are the focus of our future research.

(2)
Because of the large environmental noise of the infrared images within the HIT-UAV dataset and the small size of instances within the person category, it causes the environmental noise points to be detected as person or the person to be detected as environmental noise, which reduces the mAP50 value of the algorithm’s detection results, and the algorithm’s small target detection ability needs to be improved in future research.
(3)
To understand the EfficientNCSP-Net focus on which areas when the road object detection process, this paper outputs the feature map of object detection through the class activation map (CAM)⁴⁷, which reflects the algorithm’s important positional areas in object detection. The CAM of object detection by EfficientNCSP-net for some simulation pictures is shown in Fig. 15. The hotter colors in the CAM represent higher gradient information, and this region is important in the object detection process, whereas the cooler colors represent lower gradient information, and this region contributes weakly to the object detection.

According to Fig. 15, the color of the target region within the CAM is hotter than that of the other regions, and the color within the region without the target is relatively cooler, which indicates that regions with the target have higher gradient information, so EfficientNCSP-Net is able to focus on the object’s image feature information during the object detection process on the IR-TSS dataset.

Conclusion

To improve the quality of infrared image simulations of open-source AirSim, this paper proposes an improved infrared image simulation method for AirSim in traffic scenes. In this method, the 3D model is constructed via segmented modeling, which is used to construct a 3D traffic scene in the Unreal Engine. According to the features of the actual scene, we set the temperature and surface emissivity of different objects in the simulated scene, calculate the integrated radiation of the objects through Planck’s law, and then import the integrated surface radiation into the background database of AirSim software to simulate the infrared images of the traffic scene with higher image quality. We collect a large amount of image data through this simulation method and construct an infrared image traffic simulated scene dataset (IR-TSS). Seven different targets, such as people, bicycles, motorcycles, cars, and buses from the dataset are detected via the method proposed in this paper and other typical object detection methods. Comparative and ablative experiments were performed on IR-TSS dataset, DroneVehicle dataset and HIT-UAV dataset. The experimental results show that the proposed infrared image simulation method in this paper has the advantage of image quality over existing AirSim simulation, while the mAP₅₀ of the proposed target detection algorithm EfficientNCSP-Net reaches 96.2% on IR-TSS dataset, which is higher than the mAP₅₀ value of existing popular methods such as YOLOv8, YOLOv9, and RT-DETR. The method not only has potential application value in the simulation of infrared images of traffic scenes but also has some reference value in the study of infrared image simulation in other fields and can provide data support in the study of multiangle object detection.

Data availability

The IR-TSS dataset mentioned in this paper is openly and freely available at https://www.scidb.cn/en/s/Jbeume. The drone vehicle dataset used in this study is freely available at https://github.com/VisDrone/DroneVehicle. The HIT-UAV dataset used in this study is freely available at https://github.com/suojiashun/HIT-UAV-Infrared-Thermal-Dataset.

References

Fernández, J., Cañas, J. M., Fernández, V. & Paniego, S. Robust real-time traffic surveillance with deep learning. Comput. Intell. Neurosci. 2021, 4632353. https://doi.org/10.1155/2021/4632353 (2021).
Chaudhuri, A. Smart traffic management of vehicles using faster R-CNN based deep learning method. Sci. Rep. 14, 231110099. https://doi.org/10.1038/s41598-024-60596-4 (2024).
Article ADS CAS MATH Google Scholar
McGee, J., Mathew, S. J. & Gonzalez, F. Unmanned aerial vehicle and artificial intelligence for thermal target detection in search and rescue applications. In 2020 International Conference on Unmanned Aircraft Systems (ICUAS), 883–891. https://doi.org/10.1109/ICUAS48674.2020.9213849 (2020).
Rizk, M., Slim, F. & Charara, J. Toward ai-assisted uav for human detection in search and rescue missions. In 2021 International Conference on Decision Aid Sciences and Application (DASA), 781–786. https://doi.org/10.1109/DASA53625.9682412 (2021).
WeeLiam Khor, M. R. F. C. & Chen, Y. K. Automated detection and classification of concealed objects using infrared thermography and convolutional neural networks. Sci. Rep. 14, 83534. https://doi.org/10.1038/s41598-024-56636-8 (2024).
Article CAS Google Scholar
Oseni Ayodeji, J. H. L. P. T. Z. V. A. & Nour, M. Security and privacy for artificial intelligence: opportunities and challenges. arXiv e-prints arXiv:2102.04661. https://doi.org/10.48550/arXiv.2102.04661 (2021).
Aydin, G. D. & Ozer, S. Infrared detection technologies in smart agriculture: A review. In 2023 International Aegean Conference on Electrical Machines and Power Electronics (ACEMP) 2023 International Conference on Optimization of Electrical and Electronic Equipment (OPTIM), 1–8. https://doi.org/10.1109/ACEMP-OPTIM57845.2023.10287033 (2023).
Mana, A. et al. Sustainable Ai-based production agriculture: exploring Ai applications and implications in agricultural practices. Smart Agric. Technol. 7, 100416. https://doi.org/10.1016/j.atech.2024.100416 (2024).
Article MATH Google Scholar
Konya, A. & Nematzadeh, P. Recent applications of ai to environmental disciplines: a review. Sci. Total Environ. 906, 167705. https://doi.org/10.1016/j.scitotenv.2023.167705 (2024).
Article CAS PubMed MATH Google Scholar
Daniele Silvestro, T. S. A. A. & Goria, S. Improving biodiversity protection through artificial intelligence. Nat. Sustain. 5, 415–424. https://doi.org/10.1038/s41893-022-00851-6 (2022).
Article PubMed PubMed Central Google Scholar
Goff, A. L., Latger, J. & Cathala, T. Evolution of SE-Workbench-EO to generate synthetic EO/IR image data sets for machine learning. In Automatic Target Recognition XXXII, vol. 12096 (eds. Hammoud, R. I., Overman, T. L., Mahalanobis, A. & Jaskie, K.) 120960J https://doi.org/10.1117/12.2632231(International Society for Optics and PhotonicsSPIE, 2022).
Yu, C., Zhang, H. & Zheng, G. Research on infrared imaging simulation technology of ocean scene. In AOPC 2021: Optical Sensing and Imaging Technology, vol. 12065 (eds Jiang, Y., Lv, Q., Liu, D., Zhang, D. & Xue, B.) 1206513. https://doi.org/10.1117/12.2605284 (International Society for Optics and Photonics SPIE, 2021).
Yang, X., Zhang, H. & Liao, S. Infrared image simulation technology based on Vega Prime. In SPIE, International Conference on Image Processing and Intelligent Control (IPIC 2021), vol. 11928 (eds. Wu, F. & Cen, F.) 119280J https://doi.org/10.1117/12.2611383 (International Society for Optics and Photonics, 2021).
Aggarwal, A., Mittal, M. & Battineni, G. Generative adversarial network: an overview of theory and applications. Int. J. Inf. Manag. Data Insights. 1, 100004. https://doi.org/10.1016/j.jjimei.2020.100004 (2021).
Article Google Scholar
Afaq, M. & Ahmad, R. A mobile inspection robot design analysis in ansys simulation for extreme weather conditions. In 11th International Conference on Control, Mechatronics and Automation (ICCMA), 208–214. https://doi.org/10.1109/ICCMA59762.2023.10375060 (2023).
Gong, W., Zhang, T., Liu, J. & Zhang, Y. Infrared sequence simulation method for aerial moving platform target scene. In 2022 International Conference on Artificial Intelligence and Computer Information Technology (AICIT), 1–4. https://doi.org/10.1109/AICIT55386.2022.9930171 (2022).
Shah, S., Dey, D., Lovett, C., Kapoor, A. AirSim High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. arXiv e-prints arXiv:1705.05065. https://doi.org/10.48550/arXiv.1705.05065 (2017).
Vemprala, S., Chen, S., Shukla, A., Narayanan, D. & Kapoor, A. Grid: A platform for general robot intelligence development 2310.00887. (2023).
Qiu, W., Yuille, A. Unrealcv Connecting computer vision to unreal engine 1609.01326. (2016).
Glover, A., Richardson, M. A. & Barlow, N. Modeling fast jet infrared countermeasures: pseudo-imaging seekers with an ultraviolet guard band. J. Def. Model. Simul. 19, 363–373. https://doi.org/10.1177/1548512920938876 (2022).
Article Google Scholar
Pszczel, M. et al. SPIE,. Validation of target and background modeling in midwave infrared band for tropical maritime environment (Conference Presentation). In Target and Background Signatures IV, vol. 10794 (eds. Stein, K. U. & Schleijpen, R.) 1079409. https://doi.org/10.1117/12.2325520 (International Society for Optics and Photonics, 2018).
Lin, J., Ma, J., Wu, K. & Wu, J. Research of ship scene simulation based on SE-Workbench-EO. In AOPC 2017: Optical Sensing and Imaging Technology and Applications, vol. 10462 (eds Jiang, Y., Gong, H., Chen, W. & Li, J.) 104620V. https://doi.org/10.1117/12.2282902 (International Society for Optics and Photonics SPIE, 2017).
Wang, P., Sun, H., Bai, X., Guo, S. & Jin, D. Traffic thermal infrared texture generation based on siamese semantic cyclegan. Infrared Phys. Technol. 116, 103748. https://doi.org/10.1016/j.infrared.2021.103748 (2021).
Article Google Scholar
Lyu, X. et al. An improved infrared simulation method based on generative adversarial networks. Infrared Phys. Technol. 140, 105424. https://doi.org/10.1016/j.infrared.2024.105424 (2024).
Article MATH Google Scholar
Özkanog˘lu, M. A., Ozer, S. & Infragan A gan architecture to transfer visible images to infrared domain. Pattern Recognit. Lett. 155, 69–76. https://doi.org/10.1016/j.patrec.2022.01.026 (2022).
Article ADS Google Scholar
Wu, X., Zhang, C., Huang, M., Yang, C. & Ding, G. Quantitative atmospheric rendering for real-time infrared scene simulation. Infrared Phys. Technol. 114, 103610. https://doi.org/10.1016/j.infrared.2020.103610 (2021).
Article CAS Google Scholar
Wu, T., Zhang, Y., Yang, J., Liu, J. & Zhang, H. Production method of light field image simulation based on 3ds max 3d reconstruction. In 2nd International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), 150–153. https://doi.org/10.1109/CEI57409.2022.9950222 (2022).
Zhijian, H., Bingwei, H. & Shujin, S. An infrared sequence image generating method for target detection and tracking. Front. Comput. Neurosci. 16 https://doi.org/10.3389/fncom.2022.930827 (2022).
Singh, S. & Kaur, A. Game development using unity game engine. In 3rd International Conference on Computing, Analytics and Networks (ICAN), 1–6. https://doi.org/10.1109/ICAN56228.2022.10007155 (2022).
Zys´k, K. et al. Generation of artificial infrared camera images for visual navigation simulation. Pap. ESA GNC-ICATT 2023 (2023).
Bastos, D., Monteiro, P. P., Oliveira, A. S. R. & Drummond, M. V. An overview of lidar requirements and techniques for autonomous driving. In 2021 Telecoms Conference (ConfTELE), 1–6. https://doi.org/10.1109/ConfTELE50222.2021.9435580 (2021).
Jansen, W. et al. Cosys-airsim: a real-time simulation framework expanded for complex industrial applications (2023). 2303.13381.
Wang, A. et al. Yolov10: real-time end-to-end object detection 240514458. (2024).
Zhao, Y. et al. Detrs beat yolos on real-time object detection. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 16965–16974. https://doi.org/10.1109/CVPR52733.2024.01605 (2024).
Autodesk. 3ds max 2020. https://www.autodesk.com/.
Epic Games. Unreal engine 4.27.2. https://www.unrealengine.com/en-US.
Bondi, E. et al. Association for Computing Machinery, New York, NY, USA,. Airsim-w: A simulation environment for wildlife conservation with uavs. In Proceedings of the 1st ACM SIGCAS Conference on Computing and Sustainable Societies, COMPASS ’18, https://doi.org/10.1145/3209811.3209880 (2018).
Bondi, E. et al. Birdsai: A dataset for detection and tracking in aerial thermal infrared videos. In. IEEE Winter Conference on Applications of Computer Vision (WACV), 1736–1745. https://doi.org/10.1109/WACV45572.2020.9093284 (2020).
Microsoft, A. I. Research Airsim 1.8.1-windows. https://GitHub-microsoft/AirSimatv1.8.1-windows.
Wang, C. Y. et al. Cspnet: A new backbone that can enhance learning capability of cnn. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1571–1580. https://doi.org/10.1109/CVPRW50498.2020.00203 (2020).
Liu, S., Qi, L., Qin, H., Shi, J. & Jia, J. Path aggregation network for instance segmentation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8759–8768. https://doi.org/10.1109/CVPR.2018.00913 (2018).
Chen, X., Lian, Q., Chen, X. & Shang, J. Surface crack detection method for coal rock based on improved yolov5. Appl. Sci. 12 https://doi.org/10.3390/app12199695 (2022).
Aibibu, T., Lan, J., Zeng, Y., Lu, W. & Gu, N. Feature-enhanced attention and dual-gelan net (feadg-net) for uav infrared small object detection in traffic surveillance. Drones 8 https://doi.org/10.3390/drones8070304 (2024).
Wang, C. Y., Yeh, I. H. & Liao, H. Y. M. Yolov9: learning what you want to learn using programmable gradient information 240213616. (2024).
Sun, Y., Cao, B., Zhu, P. & Hu, Q. Drone-based rgb-infrared cross-modality vehicle detection via uncertainty-aware learning. IEEE Trans. Circuits Syst. Video Technol. 32, 6700–6713. https://doi.org/10.1109/TCSVT.2022.3168279 (2022).
Article MATH Google Scholar
Suo, J. et al. HIT-UAV: a high-altitude infrared thermal dataset for unmanned aerial vehicle-based object detection. Sci. Data. 10, 220403245. https://doi.org/10.1038/s41597-023-02066-6 (2023).
Article MATH Google Scholar
Chattopadhay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In IEEE Winter Conference on Applications of Computer Vision (WACV), 839–847. https://doi.org/10.1109/WACV.2018.00097 (2018).

Download references

Acknowledgements

We would like to thank the editor and reviewers for their reviews, which improved the content of this paper.

Funding

This work was funded in part by the 14th Five-Year Plan Funding of China, Grant number 50916040401, and in part by the Fundamental Research Program, Grant number 514010503-201, and in part by the National Natural Science Foundation of China, Grant number 62476024. National Key Laboratory ofUnmanned Aerial Vehicle Technology in NPU (WR2024132).

Author information

Tuerniyazi Aibibu and Jinhui Lan contributed equally to this work

Authors and Affiliations

Department of Instrument Science and Technology, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Tuerniyazi Aibibu, Jinhui Lan, Yiliang Zeng, Jinghao Hu & Zhuo Yong
Xinjiang Vocational and Technical College of Communications, Urumqi, 831401, China
Tuerniyazi Aibibu
Beijing Engineering Research Center of Industrial Spectrum Imaging, School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, 100083, China
Jinhui Lan, Yiliang Zeng, Jinghao Hu & Zhuo Yong

Authors

Tuerniyazi Aibibu
View author publications
You can also search for this author inPubMed Google Scholar
Jinhui Lan
View author publications
You can also search for this author inPubMed Google Scholar
Yiliang Zeng
View author publications
You can also search for this author inPubMed Google Scholar
Jinghao Hu
View author publications
You can also search for this author inPubMed Google Scholar
Zhuo Yong
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

T.A. and J.L. are co-first authors with equal contributions; methodology, T.A.; investigation, Z.Y. and J.H.; conceptualization, T.A. and J.L.; writing, T.A. and Y.Z.; validation, Y.Z.; supervision, J.L. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Jinhui Lan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Aibibu, T., Lan, J., Zeng, Y. et al. Multiview angle UAV infrared image simulation with segmented model and object detection for traffic surveillance. Sci Rep 15, 5254 (2025). https://doi.org/10.1038/s41598-025-89585-x

Download citation

Received: 29 October 2024
Accepted: 06 February 2025
Published: 12 February 2025
DOI: https://doi.org/10.1038/s41598-025-89585-x