Introduction

With the rapid development of integrated circuit technology and artificial intelligence technology, infrared imaging technology has played a great role in different application fields, such as traffic1,2, rescue3,4, security5,6, agriculture7,8 and environmental protection9,10. In particularly, the combination of drone technology and infrared imaging technology has extended their applications to more fields. Consequently, UAV infrared imaging technology is favored by many researchers in the field of traffic monitoring because of its flexibility and strong anti-interference ability. Many scientific studies have been carried out on the problem of infrared image object detection in traffic scenarios, but real UAVs are limited by flight restrictions in certain traffic scenarios, while frequent flights increase the cost of technical research, which makes data collection difficult for technical researchers. Therefore, it is necessary to study simulation methods for UAV infrared images in traffic scenarios. In this way, aerial infrared image data are collected via simulation, which can reduce the cost of data collection and improve the efficiency of technical research and development. A schematic diagram of the significance of the IR image simulation is shown in Fig. 1.

Fig. 1
figure 1

Schematic diagram of the significance of IR image simulation.

According to Fig. 1, UAV infrared image simulation technology not only has application value in the field of traffic monitoring but also has high potential in the fields of national defense and civil applications, such as wildlife protection, search, rescue, and vehicle detection. Currently, many researchers in different fields have carried out research on infrared image simulations and obtained certain results. Current infrared image simulation methods can be summarized as follows. The first is based on professional commercial software for infrared image simulation, such as SE-Workbench-IR11, JRM12, and Presagis Vega Prime13. Commercial software has a better simulation effect, but the cost is high. The high cost limits its use, and people without solid financial support cannot access professional infrared simulation software. The second method is to simulate infrared scenes through deep learning generative adversarial networks (GANs)14. However, these methods require many training samples to complete model training, and obtaining different angles of traffic scene data through the infrared simulation method based on a GAN is inconvenient. The third method is a combination of 3D modeling software and Ansys15 software for infrared image simulation16. Ansys is a comprehensive commercial simulation software that can simulate subtle temperature changes on an object’s surface, but it not only needs to spend money to buy a software license but also has less relevance for IR image simulation, which requires many parameters to be set up during the simulation process, thus making the simulation inefficient. The fourth method is UAV-view infrared image simulation based on AirSim17 or AirGen18, which uses the Unreal Engine19 as the basic platform and generates infrared simulation images through the open-source plug-in AirSim from Microsoft Corporation17. The advantages of this method are that software such as Unreal Engine and AirSim do not require users to purchase their licenses; thus, it can be accessed by many people, so it has potential research value for follow-up research and extension. The disadvantages are that the quality of the UAV viewpoint infrared simulation image based on AirSim is relatively poor, which is suitable for the simulation of infrared images of long-distance and small-scale target scenes, and the simulation results of the IR images for short-distance objects have a certain gap between the simulation results and their real infrared images, which need to be further improved in detail and quality. Considering both the advantages and disadvantages of these methods mentioned above, this paper proposed an improved UAV infrared image simulation method based on AirSim, through which an IR-TSS dataset was developed and corresponding object detection model EfficientNCSP-Net was designed for UAV infrared images. In order to verify the effectiveness of our infrared image simulation algorithm and the proposed object detection algorithm, we carried out comparative experiments and ablation experiments using the IR-TSS dataset, Drone Vehicle dataset, and HIT-UAV dataset. The experimental results show that the images generated by our proposed infrared image simulation method have more object detail information than those generated by the existing AirSim-based infrared image simulation method, and the proposed target detection method achieves a traffic object detection accuracy of 96.2% on the IR-TSS dataset, which is higher than the existing mainstream methods.

This paper is organized that the second section presented the research results of other researchers in infrared image simulation which have been completed in recent years and described their advantages and disadvantages; the third section provided a detailed description of the improved infrared image simulation method based on AirSim; the fourth section introduced the process of creating the IR-TSS dataset and its features; the fifth section presented the object detection model EfficientNCSP-Net for UAV infrared image; the sixth section showed the comparative experiments and the ablation experiment; the seventh section discussed the results of this paper according to the theory of the methods and the experimental results; at the end, we summarized this paper and gave the conclusion.

Related work

The contributions of this paper are as follows. (1) Based on the original AirSim infrared image simulation method, this paper provides an improved method for generating infrared images, which improves the quality of the infrared images and increases the detailed content of the objects in the IR images. (2) We collect simulated traffic scene infrared images and construct an infrared traffic scene simulation dataset (IR-TSS) containing multiple targets, which includes seven different types of traffic scene targets, such as people, bicycles, cars and buses. (3) The model training process of popular algorithms such as YOLOv10 and RT-DETR on IR-TSS is analyzed, and the object detection effects of different algorithms on the IR-TSS dataset are compared. (4) With the improved infrared image simulation method introduced in this paper, it is possible to obtain high-quality infrared simulation images with different viewpoints at a lower cost, which has certain reference significance in simulating infrared images for UAVs and intelligent self-driving fields.

Focusing on different methods, researchers in the field of infrared imaging have carried out many studies on infrared image simulations according to their own needs and laboratory conditions and have achieved relatively good results. For example, in the simulation of infrared images via commercially available software, Yang et al.13 used Vega Prime’s Ondulus IR software to simulate infrared images of outdoor scenes. The corresponding infrared image of objects is simulated by physical modeling, but the condition of using this simulation method is to have the Ondulus IR software from Vega Prime, which has a high cost. Yu et al.12 introduced a method to obtain infrared simulation images of moving ships on the sea surface via JRM simulation software. IR simulation images of complex sea surface objects can be obtained effectively via this method, but the JRM is expensive. A. Glover et al.20 generated infrared images of aircraft via the CounterSim software from Chemring Countermeasures Ltd. Although it is possible to simulate aircraft infrared images better via this method, the commercial CounterSim software is costly, making it more expensive to promote the method. Pszczel et al.21 studied a method for simulating infrared images of tropical marine environments via VIRSuite software. The method uses tropical climate information to mathematically model the marine environment and generate infrared simulation images, but it relies on VIRSuite software, which is generally difficult to access. Lin et al.22 presented a method to simulate infrared images of ships on the sea surface via SE-Workbench. This method generates infrared simulation images of a ship by constructing an infrared radiation model of the sea surface and analyzing the simulation results. However, this method relies on SE-Workbench software, and it is impossible to complete related research without a large amount of financial support. In terms of infrared image simulation based on generative adversarial networks, Wang et al.23 introduced a recurrent adversarial generative network for infrared image simulation of traffic scenes. In this method, a new infrared image is simulated from the input visible and infrared images. Lyu et al.24 proposed an infrared image simulation method with improved adversarial generative networks. In this method, a new adversarial generative network model is constructed to generate infrared simulated images, and it is used for infrared data augmentation. Özkanog˘lu et al.25 illustrated a generative adversarial network model for simulating infrared images based on visible images. This method is used to solve the problem of difficulty in finding infrared image datasets for deep learning tasks. Wu et al.26 introduced an atmospheric infrared image simulation method based on generative adversarial networks. The method simulated atmospheric infrared images in real time through generative adversarial networks based on the physical mechanism of infrared atmospheric transmission. These methods, which are based on generating adversarial networks, yield relatively good-quality simulated infrared images; however, large amounts of data are needed for network training, and it is not easy to obtain simulated infrared images at any viewing angle. In terms of infrared image simulation based on Ansys or other 3D modeling software, Gong et al.16 introduced a simulation method for aircraft infrared images based on Ansys and 3D Max software27. The method uses 3dmax software to construct a three-dimensional model of the aircraft, and the temperature distribution model of the aircraft is performed through Ansys software so that the IR image of the aircraft is simulated. Huang et al.28 introduced an infrared image simulation method that combines real images with simulated models. In this method, real infrared images are used to build the background of the scene, and the 3D model of the aircraft and the infrared radiation texture image are constructed via Unity 3D software29, thus obtaining the infrared simulation image of the specified scene. Zys´k et al.30 introduced a method for simulating infrared images via SKA RFS software. In this method, the infrared image of a flying object is simulated in SKA RFS software based on the real test infrared images and parameters, and the simulated infrared image is input into the object recognition module to recognize objects. Although these methods are relatively flexible and able to obtain object infrared simulation images from different viewpoints, they require purchasing key 3D modeling or infrared radiation modeling software, and many parameters need to be set up and adjusted, which leads to a relatively high cost and low simulation efficiency. In terms of infrared image simulation based on open-source software such as AirSim and AirGen, Shah et al.17 introduced an open source unmanned aerial vehicle, the smart driving simulation plugin AirSim, which is based on the Unreal Engine. The plug-in AirSim can simulate the imaging information of various sensors, such as visible light, infrared and LiDAR31 for UAVs and intelligent vehicles, but its infrared image simulation function is relatively weak, and the quality of the picture is rather low. W. Jansen et al.32 improved AirSim for industrial applications and constructed Cosys-AirSim. Cosys-AirSim adds 3D models of industrial robots, shelves, etc., based on the original operation mechanism of AirSim, but its infrared image simulation performance is not further improved. Vemprala et al.18 developed the machine intelligence R&D (research and development) platform Grid. The platform includes the AirGen module, which provides UAV aerial image simulation functions via AirSim. Although AirGen’s visible aerial image simulation functions and LiDAR point cloud image simulation functions have improved, its infrared aerial image simulation functions have not improved, and the quality of the infrared images generated by AirGen is still low. Considering the advantages and disadvantages of several different infrared image simulation methods mentioned above, we find that the infrared aerial image simulation method based on open-source software has obvious potential application value over other methods. First, the simulation cost is low, and there is no need to spend much money buying software for simulation. Second, there is no need to collect a large amount of real infrared imaging data and no need to train deep learning networks, which saves time and manual costs. Third, open-source software such as AirSim can simulate visible light images, LIDAR images and others in addition to infrared images, which is beneficial for carrying out synergistic simulation experiments with multiple sensors. However, the disadvantage of AirSim open-source simulation software is that its simulated infrared images are low quality and lack detailed information from objects. The proposed method constructs 3D model of each object in the scene via segmentation modeling and then imports the model into the Unreal Engine to construct a 3D traffic scene. Furthermore, the gray value of each object in the scene is calculated via the mathematical model of infrared radiation, and certain amount of noise is added to generate the infrared simulation image. Finally, the IR Traffic Scene simulation dataset (IR-TSS) and the object detection method EfficientNCSP-net are constructed. Finally, the IR simulation image object detection results are compared and analyzed via popular object detection models such as YOLOv1033 and RT-DETR34.

The contributions of this paper are as follows. (1) Based on the original AirSim infrared image simulation method, this paper provides an improved method for generating infrared images, which improves the quality of the infrared images and increases the detailed content of the objects in the IR images. (2) We collect simulated traffic scene infrared images and construct an infrared traffic scene simulation dataset (IR-TSS) containing multiple targets, which includes seven different types of traffic scene targets, such as people, bicycles, cars and buses. (3) The model training process of popular algorithms such as YOLOv10 and RT-DETR on IR-TSS is analyzed, and the object detection effects of different algorithms on the IR-TSS dataset are compared. (4) With the improved infrared image simulation method introduced in this paper, it is possible to obtain high-quality infrared simulation images with different viewpoints at a lower cost, which has certain reference significance in simulating infrared images for UAVs and intelligent self-driving fields.

Methods

This paper presents an improved AirSim-based simulation method for UAV infrared images. Based on the original AirSim infrared image simulation mechanism, we construct a 3D model for the traffic scene via segmentation modeling and then import the 3D model into Unreal Engine to build the required traffic scene. After calculating the grayscale value of the surface of the object via the infrared radiation computation model and adding a certain amount of noise, we capture the simulation infrared image of the traffic scene. A block diagram of this infrared image simulation method is shown in Fig. 2.

Fig. 2
figure 2

The block diagram of this infrared image simulation method.

According to Fig. 2, the infrared aerial image simulation method proposed in this paper consists of several main parts, such as segmented 3D model construction, building a traffic scene, calculating the infrared radiation value of the object and adding noise.

  1. (1)

    Construction of segmented 3D models.

In this method, the traffic scene consists of many 3D models, so 3D modeling of different traffic objects is required for constructing traffic scenes. Since AirSim simulates infrared images using the ID information of objects in the scene, to provide different ID numbers for different regions of objects, this paper segments the objects into different regional modules according to their material and temperature distributions.

The 3D modeling and segmentation process of the traffic scene is performed via 3D Max software. First, according to the requirements of models for building typical traffic scenes, we constructed 3D models of 7 types of objects, such as people, cars, bicycles, motorbikes, vans, buses, and trucks. On this basis, the model is segmented into several major areas based on material and temperature distributions. For example, a car is divided into the engine area, the glass area, the tire area, and the body area; it is divided into the eye, mouth area, the exposed skin area, and the clothing area for a person; and so on. The 3D model of the car with tire segmentation is shown in Fig. 3.

The 3D modeling and segmentation process of the traffic scene is performed via 3D Max software. First, according to the requirements of models for building typical traffic scenes, we constructed 3D models of 7 types of objects, such as people, cars, bicycles, motorbikes, vans, buses, and trucks. On this basis, the model is segmented into several major areas based on material and temperature distributions. For example, a car is divided into the engine area, the glass area, the tire area, and the body area; it is divided into the eye, mouth area, the exposed skin area, and the clothing area for a person; and so on. The 3D model of the car with tire segmentation is shown in Fig. 3.

Fig. 3
figure 3

3D model of the car with tire segmentation35.

  1. (2)

    Construction of the traffic scene.

After seven different segmented 3D models were completed in 3D Max software, the traffic scene was constructed by importing them into Unreal Engine. To import AirSim into Unreal Engine, the AirSim source code and the Unreal Engine project code need to be launched through Microsoft Visual Studio. After launching the Unreal Engine scene project, we can import the segmented 3D model into the traffic scene. The 3D traffic scene constructed in the Unreal Engine is shown in Fig. 4.

Fig. 4
figure 4

The 3D traffic scene constructed in unreal engine36.

  1. (3)

    Model of thermal infrared radiation.

To obtain the infrared simulation image of the traffic scene, we need to calculate the infrared radiation of the object’s surface in the traffic scene. According to the infrared imaging mechanism, the thermal infrared radiation emitted by an object is related mainly to the infrared emissivity of the object’s surface and surface temperature. We set the corresponding temperatures and average emissivity for the different segmentation regions of each object in the traffic scene, which are shown in Table 1.

Table 1 Temperature and emissivity settings for traffic objects.

According to Table 1, the infrared radiation of traffic objects can be calculated via Planck’s law using the parameters of an object’s surface emissivity and surface temperature. Planck’s law is formulated37 as follows:

$$\:\begin{array}{c}L\left(T,{\varepsilon}_{avg},{R}_{\lambda\:}\right)={\varepsilon}_{avg}{\int\:}_{\lambda\:=8\mu\:m}^{\lambda\:=14\mu\:m}{R}_{\lambda\:}\left(\frac{2h{c}^{2}}{{\lambda\:}^{5}}\frac{1}{\text{exp}\left(\frac{hc}{kT\lambda\:}\right)-1}\right)d\lambda\:\end{array}$$
(1)

where \(\:L\) is the infrared radiation, \(\:{\varepsilon}_{avg}\) is the average emissivity, \(\:T\) is the temperature, \(\:\lambda\:\) is the wavelength, \(\:{R}_{\lambda\:}\) is the average peak reaction rate of the camera, \(\:h\) is Planck’s constant, \(\:k\) is Boltzmann’s constant, and \(\:c\) is the speed of light.

The infrared radiation of the object surface in the traffic scene is calculated via Planck’s law and normalized to obtain integrated infrared radiation values. This integrated radiation value is used by AirSim to render infrared images of objects in traffic scenes.

  1. (4)

    Noise.

By adding noise, the infrared simulation image can be made more like the real infrared image. Therefore, noise is added to the simulation image via AirSim infrared image simulation in this paper. According to Bondi et al.38, infrared images are affected by noise such as Johnson noise, flicker noise, and thermal noise during imaging inside the camera, whereas infrared rays are affected by air humidity, fog, and dust during atmospheric transmission. Hence, we add a certain amount of Gaussian noise and pepper noise into the simulated image to represent the noise in the real infrared image. The comparison results of the simulated infrared images with and without noise are shown in Fig. 5.

Fig. 5
figure 5

The comparison results of simulated infrared images with noise added and without noise39.

According to Fig. 5, the gray values fluctuate between neighboring pixels after noise is added; thus, the simulation results are more similar to the infrared images from real traffic scenes.

IR-TSS dataset

To make the simulated infrared images generated by improved AirSim useful in processing infrared images and designing target detection algorithms, this paper constructs an Infrared Traffic Scene Simulation dataset (IR-TSS) by collecting a large number of simulated images. To ensure the diversity of imaging scales and angles for different traffic targets, the UAV captured simulated image data at 40 m, 70 m, 90 m, and 120 m heights in the Unreal Engine traffic scene at 90° and 60° viewing angles. The results of the 3D visualization of the flight trajectory during the acquisition of simulated infrared images by the UAV are shown in Fig. 6.

Fig. 6
figure 6

3D visualization of the flight trajectory.

We collected more than 5500 infrared simulation images of the traffic scene according to the UAV flight trajectory in Fig. 6, and 3750 valid images were selected from them to construct the IR-TSS dataset. The dataset contains 7 different categories of objects, such as person, cars, vans, motorcycles, bicycles, buses, trucks, and more than 11,000 instances of objects. The distributions of the number and size of object instances for the IR-TSS dataset are shown in Fig. 7.

Fig. 7
figure 7

Distribution of the number and size of object instances for the IR-TSS dataset.

According to Fig. 7a, Car has the maximum number of instances in this dataset, which is nearly 3000, and Truck has the minimum number of instances, which is more than 600. In Fig. 7b, the horizontal axis represents different categories, and the vertical axis represents the ratio of the pixel area of the object instance to the area of the whole image, which is calculated as follows:

$$\:\begin{array}{c}E=\frac{\left(h\times\:H\right)\times\:\left(w\times\:W\right)}{H\times\:W}\times\:100\end{array}$$
(2)

where E is the percentage occupied by the object in the image, h is the height of the object label, w is the width of the object label, H is the height of the sample image, and W is the width of the sample image.

According to the box plot in Fig. 7b, the object instances area does not occupy more than 10% of the total sample image area, and the fluctuation in the object instance size is relatively significant, which indicates the diversification of instance scales in the dataset.

IR-TSS dataset not only has high application value in the field of traffic monitoring, but also can be applied in object search in urban areas and road object tracking. Meanwhile, it has potential application trends for analysis of distinguishing objects with similar image features such as motorcycle and bicycle, and analyzing the effects of different infrared imaging angles for object detection.

Detection

To check the effectiveness of being detected for different categories of objects in the IR-TSS dataset during the deep learning target detection task, this paper designs a target detection method (EfficientNCSP-net) for the IR-TSS dataset. The object detection results of this algorithm on the IR-TSS dataset are compared with those of popular algorithms such as YOLOv10, YOLOv9, and RT-DETR34 to verify the effectiveness of EfficientNCSP-net. A schematic diagram of the EfficientNCSP-Net network structure is shown in Fig. 8.

Fig. 8
figure 8

Schematic diagram of the EfficientNCSP-Net network structure.

The EfficientNCSP-Net presented in this paper consists of three parts: the backbone, neck, and head. In the backbone part of this network, CSPdarknet40 is used from YOLOv8. In the neck section, an improved PAFPN41 structure is used; it imports YOLOv9’s RepNCSPELAN4 structure based on the original PAFPN structure, which guarantees that the gradient is not lost in the process of network training and retains more information about the object’s features. The head part uses Efficient-iou42 which is based on the detection head of YOLOv8, and constructs a new loss function, which lays the foundation for further improving the detection accuracy. The results of road object detection for the different methods are presented in the experimental section of this paper.

Experiments

To verify that the method proposed in this paper has several advantages in AirSim-based infrared image simulation and target detection, we carried out a simulation result comparison experiment and a target detection analysis experiment based on traffic scene simulation images.

  1. (1)

    Comparative experiment of simulation results.

To prove the advantages of the proposed infrared image simulation method over existing AirSim infrared image simulation methods, comparative experiments are carried out in this paper. The infrared image simulation result of the original AirSim method is compared with the infrared image simulation result of the improved method, and the comparison result is shown in Fig. 9.

Fig. 9
figure 9

Comparison of infrared image simulation results39.

According to Fig. 9, the infrared image simulated by the original AirSim method does not have detailed object information, which is not suitable for the infrared simulation of short-distance objects. However, the method proposed in this paper generates infrared images with more information on the object, which has more research and optimization value.

  1. (2)

    Comparative object detection experiments on IR-TSS dataset.

To verify the effectiveness of the object detection method EfficientNCSP-Net on the simulation dataset, this paper uses the detection algorithm EfficientNCSP-Net on the simulation dataset IR-TSS for the detection of objects such as person, bicycles, vehicles, etc. The target detection result metrics43 of this algorithm on the IR-TSS dataset are compared with those of popular algorithms such as YOLOv1033, YOLOv944, and RT-DETR34. The object detection results of different algorithms on the IR-TSS dataset are shown in Table 2 (input image size 640 × 512).

Table 2 Object detection results of different algorithms on the IR-TSS dataset.

According to Table 2, our proposed EfficientNCSP-net object detection algorithm on the IR-TSS dataset has the highest target detection mAP50, with a value of 96.2%, which is greater than those of other algorithms such as YOLOv8 and YOLOv9. Except for van and bus, the AP50 values of our algorithm for different categories are not less than those of other algorithms. The reason why the AP50 values of van and bus categories are not good is that their infrared image features are not distinguishable in the dataset. It also has lower parameter size and latency time. The visualization results of the EfficientNCSP-net algorithm for object detection are shown in Fig. 10.

Fig. 10
figure 10

Visualization of the object detection results for the EfficientNCSP-net algorithm.

According to Fig. 10, the EfficientNCSP-net object detection algorithm can detect objects with different viewing angles and scales on the IR-TSS dataset.

  1. (3)

    Comparative object detection experiments on DroneVehicle dataset.

To verify the target detection results of the EfficientNCSP-net al.gorithm on real infrared datasets, we carried out comparative experiments of different algorithms on the DroneVehicle dataset45. The DroneVehicle dataset is a visible and infrared dual-mode dataset containing five types of targets: cars, vans, buses, trucks and freight cars. Because we only need infrared data in the comparative experiment, while the DroneVehicle dataset contains both visible and infrared data, we removed the visible data from the DroneVehicle dataset and carried out comparative experiments on the rest of the infrared dataset via the EfficientNCSP-net, YOLOv10, yoloV9, yoloV8, and RT-DETR algorithms. The curves of mAP50 during training of different algorithms on the DroneVehicle dataset are shown in Fig. 11.

Fig. 11
figure 11

The mAP50 curves of different algorithms on the DroneVehicle dataset.

According to Fig. 11, the EfficientNCSP-net algorithm has the best training process, whereas the RT-DETR algorithm has the lowest mAP50 during the training process. We carried out further validation experiments based on training models and obtained the detection accuracy of different algorithms on the DroneVehicle dataset. The comparative results are shown in Table 3 (input image size 840 × 712).

Table 3 Object detection results of different algorithms on the DroneVehicle dataset.

According to Table 3, the mAP50 of the EfficientNCSP-net algorithm on the DroneVehicle dataset is the highest at 83.4%, whereas the mAP50 of the RT-DETR algorithm is the lowest at 77.2%. It can be seen that our proposed method has better traffic target detection efficiency by achieving the highest value of mAP50 with minimum number of parameters and latency time. To view the target detection ability of the EfficientNCSP-net algorithm on the DroneVehicle dataset more directly, we visualized its target detection results, and the visualization results are shown in Fig. 12.

Fig. 12
figure 12

Visualization of the object detection results of our algorithm on the DroneVehicle dataset.

According to Fig. 12, the EfficientNCSP-net algorithm can better detect objects such as cars and buses in real infrared aerial images, but due to the small differences among the features of objects such as vans, trucks, and freight_cars, the process of detecting the objects via our algorithm is challenging.

  1. (4)

    Comparative object detection experiments on HIT-UAV dataset.

To verify the effectiveness of our algorithms on real datasets, we carried out comparative experiments on the HIT-UAV dataset46. This HIT-UAV dataset includes five types of objects such as person, bicycle, car, other vehicle, and don’t care, and atmospheric noise is contained in the infrared images. Because there is small number of instances, no corresponding object category and fixed object features for the “don’t care” category, so “don’t care” category is eliminated from the dataset in this paper when using the HIT-UAV dataset for the comparative experiments. We carried out comparative experiments with RT-DETR, YOLOv11 and other algorithms on the HIT-UAV dataset, and the results of the experiments are shown in Table 4 (input image size 640 ×  512).

Table 4 Object detection results of different algorithms on the HIT-UAV dataset.

According to Table 4, it can be seen that the accuracy of the algorithm proposed in this paper reaches 93.9% on the HIT-UAV dataset, which is higher than others and the AP50 values of our algorithm for different categories are no less than those of other algorithms. Meanwhile, the algorithm has an advantage over other algorithms in terms of the parameter size and latency time.

In order to view the target detection performance of the algorithm on the HIT-UAV dataset more obviously, we visualize the object detection results, and the visualization results are shown in Fig. 13.

Fig. 13
figure 13

Visualization of the object detection results of our algorithm on the HIT-UAV dataset.

According to Fig. 13, it can be seen that the algorithm is able to accurately detect different objects on the HIT-UAV dataset and can distinguish different traffic objects.

  1. (5)

    Ablation experiments.

To verify the contribution of EfficientNCSP-net’s CSPdarknet, improved PAFPN and Effic-IoU modules to the detection mAP50 improvement, this paper eliminates or replaces the different network modules for ablation experiments. The results of the ablation experiment are shown in Table 5.

Table 5 Ablation experiment results of the EfficientNCSP-net algorithm on the IR-TSS dataset.

According to Table 5, the value of mAP50 for the EfficientNCSP-net network is 96.2% when all the modules are working properly. The value of mAP50 decreases to 95.4% when CSPdarknet is removed from the network. The value of mAP50 decreases to 95.9% when the improved PAFPN is removed from the network. The value of mAP50 decreases to 96.0% when the Effic-IoU is removed from the network. The results of this ablation experiment show that, in terms of improvement in detection performance, CSPdarkNet contributes the most, whereas Effic-IoU contributes the least.

Discussion

Based on the theory of the simulation methodology presented in this paper and the results of the experiments, a brief discussion of the difficulties and key points in the research is presented.

  1. (1)

    Although the infrared image simulation method proposed in this paper has certain improvement over the original AirSim method, there are some shortcomings compared with real traffic scene infrared images, which need to be improved in future work according to the principles of thermal infrared radiation distribution on the surface of the object and the mechanism of thermal infrared transmission in the atmosphere to optimize the weaknesses of the simulation algorithm. The weaknesses of the algorithm are mainly eliminated by two ways. Firstly, to simulate the process of gradual change for grayscale value of certain object part in infrared image (e.g. tire, glass of a car), it is necessary to calculate the change of thermal radiation in this part and analyze the cause of heat generation, time of heating, and other factors. Secondly, it is necessary to find the relationship of heat radiation between the neighboring parts of the object to avoid the leaping change of the grayscale value of the pixels around the neighboring parts in the infrared simulation image. The enhancement needs to consider the relationship between neighboring parts of the object, which involves factors such as heat generation time, heat conduction, etc. We also need to combine the results of practical experiments to determine the mathematical relationship of different factors for calculating the corresponding grey values in the simulation image. Schematic diagram of comparative analysis for shortcomings of the infrared simulation picture is shown in Fig. 14.

Fig. 14
figure 14

Schematic diagram of comparative analysis for shortcomings of the simulation picture.

According to Fig. 14, the proposed infrared image simulation method temporarily fails to simulate the gradual change of the grayscale value of certain part of target, and it also lacks information of the grayscale value changes caused by the infrared radiation relationship of different parts of the object. These are the focus of our future research.

  1. (2)

    Because of the large environmental noise of the infrared images within the HIT-UAV dataset and the small size of instances within the person category, it causes the environmental noise points to be detected as person or the person to be detected as environmental noise, which reduces the mAP50 value of the algorithm’s detection results, and the algorithm’s small target detection ability needs to be improved in future research.

  2. (3)

    To understand the EfficientNCSP-Net focus on which areas when the road object detection process, this paper outputs the feature map of object detection through the class activation map (CAM)47, which reflects the algorithm’s important positional areas in object detection. The CAM of object detection by EfficientNCSP-net for some simulation pictures is shown in Fig. 15. The hotter colors in the CAM represent higher gradient information, and this region is important in the object detection process, whereas the cooler colors represent lower gradient information, and this region contributes weakly to the object detection.

Fig. 15
figure 15

Correspondence between the object detection CAM and the infrared image.

According to Fig. 15, the color of the target region within the CAM is hotter than that of the other regions, and the color within the region without the target is relatively cooler, which indicates that regions with the target have higher gradient information, so EfficientNCSP-Net is able to focus on the object’s image feature information during the object detection process on the IR-TSS dataset.

Conclusion

To improve the quality of infrared image simulations of open-source AirSim, this paper proposes an improved infrared image simulation method for AirSim in traffic scenes. In this method, the 3D model is constructed via segmented modeling, which is used to construct a 3D traffic scene in the Unreal Engine. According to the features of the actual scene, we set the temperature and surface emissivity of different objects in the simulated scene, calculate the integrated radiation of the objects through Planck’s law, and then import the integrated surface radiation into the background database of AirSim software to simulate the infrared images of the traffic scene with higher image quality. We collect a large amount of image data through this simulation method and construct an infrared image traffic simulated scene dataset (IR-TSS). Seven different targets, such as people, bicycles, motorcycles, cars, and buses from the dataset are detected via the method proposed in this paper and other typical object detection methods. Comparative and ablative experiments were performed on IR-TSS dataset, DroneVehicle dataset and HIT-UAV dataset. The experimental results show that the proposed infrared image simulation method in this paper has the advantage of image quality over existing AirSim simulation, while the mAP50 of the proposed target detection algorithm EfficientNCSP-Net reaches 96.2% on IR-TSS dataset, which is higher than the mAP50 value of existing popular methods such as YOLOv8, YOLOv9, and RT-DETR. The method not only has potential application value in the simulation of infrared images of traffic scenes but also has some reference value in the study of infrared image simulation in other fields and can provide data support in the study of multiangle object detection.