Background & Summary

Global temperatures have been on a steady upward trajectory since the late 19th century1,2,3,4, largely driven by human activities that have significantly increased the concentration of greenhouse gases in the atmosphere, with carbon dioxide (CO2) being the primary contributor to global warming. It has never been more important to reduce CO2 emissions, especially now that the global XCO2 level has reached 400 parts per million (ppm)5,6 Around 2015 and 2016. Nowadays as of July 2024, the value reached 421.1 ppm7. Numerous strategies have been adopted to lessen the negative consequences of climate change in response to mounting concerns. The C40 city network, which has established local-level measures to reduce CO2 emissions in cities, is one famous example8,9. However, precise data is strongly required to evaluate these initiatives’ success. Although some towns have installed mobile and fixed ground-based measurement networks10. these networks often face limitations in terms of spatial coverage and temporal consistency. Ground-based stations are generally sparse and only provide data at fixed locations, making it difficult to capture the full variability of CO2 across diverse regions. In contrast, high temporal and spatial resolution satellite datasets offer comprehensive global coverage, enabling continuous monitoring of urban and remote areas. This allows for a more detailed understanding of CO2 dynamics, particularly in areas where ground-based measurements are unavailable. Moreover, satellite data provides consistent observations over time, which is critical for tracking short-term fluctuations and long-term trends in carbon emissions and atmospheric concentrations.

The development of remote sensing atmospheric sounding technology has increased the collection of regional and global CO2 data via satellite measurements11,12. Radiative transfer theory is employed to estimate the column-averaged carbon dioxide dry air mole fraction (XCO2) using atmospheric spectral data from spaceborne sensors13. However, significant challenges arise due to calibration issues and mismatches between retrieved and observed radiances, leading to biases in XCO2 estimates14. Various space agencies and satellites contribute to CO2 observation programs, yet they face spatial coverage limitations. For instance, NASA’s Orbiting Carbon Observatory-2 (OCO-2), launched in 2014, provides precise CO2 measurements using near-infrared observations but has a limited observational swath, posing challenges for the comprehensive spatial distribution15,16,17. Similarly, Japan’s GOSAT and ESA’s efforts have advanced CO2 monitoring, but all face challenges regarding spatial data coverage.

While the Copernicus Atmosphere Monitoring Service (CAMS) provides a more comprehensive dataset without gaps, encapsulating global XCO2 information18,19,20, its specific advantages and limitations compared to other datasets are often underemphasized. CAMS enhances data resolution and precision, especially in reconstructing daily XCO2 datasets21,22. But key challenges remain, including gaps in satellite observations caused by cloud cover or technical issues, on the other hand CAMS dataset. To mitigate these gaps, researchers have integrated multisource datasets, such as OCO-2 and OCO-3, with machine learning techniques like neural networks and convolutional neural networks to improve spatial and temporal resolution23. Data reconstruction algorithms such as the Data Interpolating Convolutional Auto-Encoder (DINCAE)24,25 and Data Interpolating Empirical Orthogonal Functions (DINEOF)26,27,28, originally developed for sea surface temperature, are also being adapted to enhance the spatiotemporal coverage of missing CO2 data.

Earlier research explored various methods aimed at expanding the spatiotemporal coverage of satellite-based CO2 measurements. These studies mainly addressed the challenge of reconstructing missing data in carbon satellite datasets. Techniques such as the Baxter-King Filter were employed following the application of algorithms like DINEOF and DINCAE to fill data gaps3,29,30,31, While these methods have been useful, gaps in their application remain. Specifically, DINEOF’s reliance on Empirical Orthogonal Functions (EOFs) to estimate missing data through dominant spatial patterns may oversimplify complex temporal and spatial variability in CO2 measurements. The comparison of datasets, such as OCO-2 and GOSAT in the near-infrared (NIR) and shortwave infrared (SWIR) regions, helped generate the initial version of the XCO2 dataset32,33,34. However, merging datasets from different instruments still presents challenges, particularly regarding consistency and resolution.

While the DINCAE algorithm, using deep learning and a convolutional auto-encoder, has shown promise by efficiently reconstructing missing data from OCO-3 measurements, its application across multiple datasets has yet to be fully explored35. The integration of multisource data, including TCCON, GOSAT, OCO-2, and OCO-3, provides a step forward in reconstructing monthly XCO2 values, but the fusion of these datasets still faces limitations in accuracy and validation.

DINCAE, as a neural network comprising encoders and decoders, reduces the resolution of satellite data through convolutional layers and then reconstructs the information using interpolation layers24,36. This approach shows potential for filling gaps in satellite datasets but requires further validation, particularly when applied to complex geophysical processes. Similarly, while DINEOF and DINCAE algorithms have improved the ability to reconstruct data from OCO-3 measurements, the accuracy and reliability of the reconstructed data must be compared against benchmark datasets like TCCON and GOSAT. TCCON, which began with the installation of equipment in Park Falls, Wisconsin, in 2004 and has since expanded to 23 globally operational devices, provides a crucial standard for validating satellite-derived CO2 data37 However, further comparison and validation efforts are essential to enhance the reliability of reconstructed datasets across different instruments and regions.

This study delves into the reconstruction of a global CO2 dataset with comprehensive coverage and high spatiotemporal resolution by integrating multisource from OCO-3, GOSAT, CAMS, contributing to the analysis of CO2 dynamics and their climatic impacts. The novel reconstruction techniques, DINEOF and DINCAE, will be employed to address data gaps due to sensor limitations and cloud cover. To enhance the data reliability and underscore the effectiveness of these methods, the reconstructed dataset will be compared with TCCON and other datasets. Furthermore, we demonstrate the critical role of post-reconstruction filtering in ensuring data integrity, significantly advancing our understanding of the carbon cycle and informing climate policy. This study also reflects on the broader implications of NASA’s OCO-3 mission for climate change analysis, advocating for enhanced satellite coverage to tackle global environmental challenges.

Methods

This study intends to reconstruct global daily gap-free XCO2 datasets with high resolution to enhance our understanding of CO2 dynamics, setting the stage for crafting robust climate models and actionable mitigation strategies, with validation and spatiotemporal pattern analysis. A key feature of this research is the employment of a high temporal resolution and consistent sampling method, which reinforces the reliability of the comparative analysis. Figure 1 graphically delineates the systematic methodology undertaken, beginning with the collation of TCCON, OCO-3, and GOSAT datasets, followed by data pre-processing which includes quality filtering and standardization to a 0.1-degree resolution. Data reconstruction is executed using DINEOF and DINCAE algorithms, where deep learning and empirical orthogonal functions merge disparate data into a coherent, gap-free dataset. This process culminates in post-processing steps, applying spatial median filtering and time series noise reduction, refining the data for enhanced accuracy, as demonstrated by the spatial and temporal trend comparisons with TCCON benchmarks.

Fig. 1
figure 1

Detailed flowchart illustrating the comprehensive estimation methodology for achieving column average CO2 concentration levels.

Input datasets overview

Although GOSAT and CAMS datasets both use data from the GOSAT satellite, their different processing methods ensure they provide unique insights. GOSAT offers direct satellite measurements, while CAMS includes reanalyzed data from multiple sources, enhancing the completeness and resolution of CO2 coverage. Combining these datasets compensates for missing information in sparse datasets like OCO-3, allowing for more accurate interpolation. To create daily column average values of CO2 maps, we analyzed multiple datasets to validate the information and study each parameter. The list of used datasets is presented in Table 1 and a detailed description.

Table 1 An overview of the data from various sources used in this investigation.

GOSAT satellite dataset

The Greenhouse Gas Observing Satellite, also known as “IBUKI” or GOSAT, is essential for monitoring two key greenhouse gases: CO2 and CH438,39. Key instruments on board include the Fourier Transform Spectrometer (FTS) and Cloud-Aerosol Imager (CAI) of the Thermal and Near-infrared Sensor (TANSO)40,41. The FTS collects reflected short-wavelength infrared (SWIR) radiation in three bands: 0.76, 1.6, and 2.0 μm, allowing for near-surface CO2 retrievals39. It also detects thermal infrared radiation (TIR) in the range of 5.5–14.3 μm, using absorption bands near 14 μm to gather CO2 concentration data up to 2 km39,42. Dense clouds and aerosol interference are captured by CAI, supporting high-quality data filtering.

For this study, we used a bias-corrected monthly GOSAT Level-3 global XCO2 product, with 2.5° × 2.5° resolution, covering June 2018 to December 2023 (Data is accessible at https://data2.gosat.nies.go.jp for GOSAT). This dataset is derived from FTS SWIR Level-2 data, which is smoothed, extrapolated, and interpolated via the Kriging approach43,44. fill gaps and improve spatiotemporal XCO2 coverage43.

OCO-3 satellite dataset

NASA’s OCO-3 satellite, launched in 2019, tracks atmospheric CO2 with a high spatial resolution of 2.5 km × 0.7 km45,46,47. Its FTS captures spectral data for accurate CO2 and CH4 column measurements, while TCCON data provides detailed validation of the total carbon column13. This study uses OCO-3’s monthly SWIR observations from August 2019 to November 2023. OCO-3’s orbital constraints can challenge data collection, but the L2FP algorithm mitigates this by filtering variables, including CO2 and H2O ratios, albedo, BRDF coefficients, RMS errors, and atmospheric factors48. We verified OCO-3 data against TCCON and atmospheric stations, focusing on the satellite’s spatial coverage from latitudes −52° to 52° and longitudes −180° to 180°.

TCCON in-situ dataset

It consists of high-resolution ground-based Fourier Transform Spectrometers that measure column-averaged CO2 and other gases such as CH4 and CO. TCCON has been crucial for validating satellite data from OCO-2, GOSAT, and other missions49,50,51,52,53,54.

We use the extensive GGG2020 dataset55 from several TCCON stations, from Burgos56, Caltech57, Darwin58, East Trout Lake59, Garmisch60, Harwell61, Hefei62, Izana63, Jet propulsion laboratory64,65, Karlsruhe66, Lamont67, Lauder68,69,70, Nicosia71, Orleans72, Paris, Park Falls73, Reunion Island74, Rikubetsu75, Saga76, Tsukuba77, Wollongong and Xianghe78, all of them distributed around the world, the data were extracted from 2018 to 2023.

CAMS dataset

The CAMS XCO2 dataset, generated by the ECMWF’s Integrated Forecasting System (IFS) and 4DVar data assimilation system, offers high-resolution global atmospheric data, including greenhouse gases, with 0.75° spatial and three-hour temporal resolution79. This study focuses on CO2 column-mean molar fractions, and differences in spatial and temporal coverage between OCO-3 are accounted for. CAMS XCO2 data is integrated for consistency and using this data for extrapolation80, improving our understanding of atmospheric CO2 dynamics. Prior studies using CAMS and OCO-2 data for XCO2 reconstruction via deep learning have shown high accuracy81.

Initial input datasets, preliminary insights

Terrestrial areas tend to exhibit larger errors compared to marine areas, with GOSAT and Carbon tracker datasets showing particularly high inaccuracies82. On the other hand, OCO-2 and OCO-3 data have spatial and temporal resolution problems because it has several missed information, statistical characteristics, uncertainties, and constraints related to temporal fluctuation and various seasons of the year are included in the data comparison between TCCON and OCO-2, including glint and nadir modes11. Regarding the data in terms of enhancing the spatiotemporal coverage of OCO-3 data in this research, the preliminary findings of applying the DINCAE and DINEOF algorithms show encouraging results. The reconstructed data will offer insightful knowledge of the dynamics of CO2 and its impact on climate change. It was crucial to improve the accuracy and dependability of our reconstructed dataset. Using diligent data filtering techniques to prepare the dataset for in-depth analysis and modeling was crucial. Meaningful advancements in carbon cycle studies, climate modeling, and policy formulation depend on our ability to comprehend the dynamics of CO2 and their broad consequences for climate change. The improvements made possible by the skillful application of DINCAE and DINEOF algorithms and cutting-edge data filtering methods significantly increased the spatiotemporal coverage and dependability of OCO-3 data. This advancement is crucial because it enhances our understanding of CO2 dynamics and their significant influence on climate change.

The OCO-3 dataset has data gaps ranging from 99.75% to 100% per day, this data was also filtered by analyzing the quality_flag equal to 0 which generally indicates reliable data. On Earth’s surface, each grid cell corresponds to a 0.1 × 0.1-degree area. This gap hinders our full comprehension of CO2 levels, potentially compromising the accuracy and completeness of global CO2 assessments and modeling efforts. Figure 2 presents a temporal analysis illustrating the variation in missing information within OCO-3 satellite data.

Fig. 2
figure 2

Percentage of missing data per daily recording of OCO-3 satellite data.

Similarly, the GOSAT L3 product, with a coarser spatial resolution of 2.5 × 2.5 degrees, shows empty data in 38–65% of the grid cells each month. On Earth’s surface, this corresponds to an area of approximately 625,000 km2. The lack of data in this range makes it difficult to fully understand CO2 dynamics, especially at local and regional scales. Figure 3 illustrates a temporal analysis of empty information variation of GOSAT L3 V03.05 satellite data products.

Fig. 3
figure 3

Number of empty grid cells per image in GOSAT L3 data.

Thus, the magnitude and significance of the missing data gaps in both OCO-3 and GOSAT L3 varied to different degrees for CO2 monitoring. Hence, the ability to collect comprehensive global CO2 distributions and perform accurate regional analyses is limited. The GOSAT data covers more XCO2 information spatially, but the temporal and spatial resolution is lower than that of the OCO-3 satellites. To improve our understanding of CO2 dynamics and contribute to the success of climate change mitigation initiatives, it is crucial to enhance satellite coverage and data collection continuously.

Figure 4 illustrates the geographical distribution of the data utilized in this study. Derived from GOSAT Level 3 and OCO-3 datasets, acquired in December 2022 and December 14, 2022, respectively. The data visualization highlights significant gaps in the global coverage of atmospheric CO2, showing that the OCO-3 dataset mostly spans the region between 52°N and 52°S. This emphasizes how critical it is to have a complete dataset and how crucial it is to rebuild satellite pictures to close this gap. The CAMS dataset, on the other hand, offers global coverage based on simulation results, highlighting a notable discrepancy in data comprehensiveness and geographic scope and emphasizing the integration of multiple sources for accurate CO2 monitoring.

Fig. 4
figure 4

Study data spatial location.

The OCO-3 satellite and its state-of-the-art equipment offer a plethora of information essential for tracking atmospheric greenhouse gases. By analyzing its SWIR data, we intend to gain insight into the global carbon cycle and assess the environmental effects of human activities83. Rigorous comparisons with TCCON, fusing the information of CAMS, GOSAT, and OCO-3, will verify the accuracy and reliability of OCO-3 for scientific research and environmental management.

Figure 5 presents an extensive plot against a world map showing the atmospheric XCO2 concentrations between 2018 and 2023. The scatter plot distinguishes data points according to the source satellite and associated geographic area. Notably, the alignment of patterns between CAMS and OCO-3 indicates a high degree of consistency in XCO2 measurements, reinforcing the reliability and calibration uniformity of the OCO-3 sensor and CAMS model.

Fig. 5
figure 5

Regional and data XCO2 trend from 2018 to 2023.

The integrated map highlights the coverage of the OCO-3 satellite in yellow, indicating the spatial emphasis of the dataset and obliquely implying a geographically limited investigation. While the overall trends of CAMS and OCO-3 align closely, GOSAT measurements deviate from this trend, especially in south ocean areas. The datasets exhibit consistency in inland areas, but significant discrepancies arise over ocean regions. These deviations, particularly highlighted in the scatter plot, point to potential regional anomalies, calibration inconsistencies, or differences in sensor sensitivities. In the larger scheme of atmospheric monitoring, these discrepancies are crucial since they could indicate the requirement for further research into regional atmospheric dynamics and cross-calibration. To address this issue, we excluded the GOSAT grid cells with large deviations from CAMS and OCO-3 to improve the reliability of the reconstructed dataset.

The scatterplot matrix displayed in Fig. 6 presents a comparison of XCO2 distribution with histograms for CAMS, GOSAT, and OCO-3 in the panels along the diagonal. This information reveals the frequency of different XCO2 ppm values captured by each system. The overlapping period for the three datasets (CAMS, OCO-3, and GOSAT) is from 2019-10 to 2020-12, so the assessment and comparison of XCO2 data to evaluate the input datasets are within this period. The scatterplots in the off-diagonal panels compare the readings for every pair of systems, and the linear regression lines show the correlation between the systems for every location. Best correlation parameters are visible in the comparison of GOSAT and CAMS data, on the other hand, the results among OCO-3 and other datasets are not consistent and, in some cases, have a negative slope in linear correlation, particularly in southern areas. These correlation factors are presented in Table 2, which offers quantitative insights into the degree of agreement between measurements from several satellite systems in a variety of places.

Fig. 6
figure 6

A correlation matrix of regional XCO2 value distributions from CAMS, GOSAT, and OCO-3 Datasets.

Table 2 Correlation Parameter Comparisons Between GOSAT, OCO-3, and CAMS Datasets.

Table 2 presents the correlation parameters by region and dataset. In Europe, CAMS and GOSAT exhibit the highest R2 value (0.64) and Pearson correlation (0.80), demonstrating a strong linear relationship and high prediction accuracy in this region. Oceania, however, shows a negative correlation (−0.09) in this particular comparison, indicating a weak and unfavorable association. Europe also shows a steep positive slope (1.39) and a significant negative intercept (−161.49), reinforcing the strong linear relationship. The correlation levels between CAMS and OCO-3 and GOSAT and OCO-3 vary by region. For example, Europe maintains a strong correlation in the CAMS-OCO-3 comparison (Pearson coefficient of 0.68), while Africa shows negative correlations in both GOSAT-OCO-3 (−0.19) and CAMS-OCO-3 (−0.18), pointing to differences in satellite data or localized fluctuations in the atmosphere. These region-specific variations highlight the complexity and variety in interpreting satellite data and underscore the need for localized atmospheric studies.

Dataset reconstruction method

According to previous studies84,85,86,87,88,89,90,91, the most often used technical approaches for modeling relationships are statistical models, machine learning, and deep learning techniques. Despite the widespread use of statistical models, including spatial autoregression models, geographically weighted regression (GWR), and linear regression models, to investigate spatiotemporal relationships, there are still certain shortcomings in estimating XCO292,93,94. Those works focus on the use of Deep Learning and EOF techniques to generate enhanced maps of XCO2 maps, which are both known in geo-statistics as space-time modal decomposition, with a focus on the crucial part that space-time modal decomposition plays in the data analysis stage of the DINEOF technique95. Our research strategy involved thorough data collection, the use of reconstruction techniques, and a strict evaluation process to fully examine the spatiotemporal coverage of OCO-3 data combined with CAMS and GOSAT. Our research had a rich context thanks to the varied climatic circumstances and variations in atmospheric CO2 levels in our study area. The OCO-3 data were gathered using cutting-edge sensors and equipment deployed on satellite platforms, providing precise measurements of the atmospheric CO2 level. Forming the backbone of this study.

To assess the spatiotemporal coverage of OCO-3 data, a comprehensive methodology involving data collection, application of DINCAE and DINEOF algorithms, and rigorous evaluation was employed. This approach facilitated a comparison of the algorithm performances through statistical analysis and visual inspection, enhancing our understanding of data reconstruction techniques and their implications for spatiotemporal coverage. The study also acknowledges limitations, particularly the potential biased or ambiguous inherent in OCO-3 data and the influence of specific characteristics of the utilized datasets and the parameter settings in the DINCAE and DINEOF algorithms on their performance. Recognizing these limitations and maintaining methodological rigor is crucial for accurate interpretation of the findings.

DINEOF reconstruction method

DINEOF (Data Interpolating Empirical Orthogonal Function), originally introduced by Pearson96, is a sophisticated analytical technique utilized to interpolate missing data and identify spatiotemporal modes. Empirical Orthogonal Function (EOF) analysis forms its foundation. The data matrix A, represented as Amxn, is the central component of DINEOF. It is a m x n matrix where m is bigger than n. In this case, n is the number of layers, usually expressed in terms of time, such as months, and m denotes all the grid cells that correspond to one layer of the research area, each grid cell arranged in a row97.

In DINEOF, the breakdown of the matrix Amxn into its parts of space and time is essential. Amxn = Vmxm Zmxn is a concise representation of this decomposition, which separates Amxn into a space function Vmxm and a time function Zmxn. Whereas the time function is made up of related time coefficients, the space function is made up of orthogonal space characteristic fields. These space characteristic fields and their coefficients are concatenated linearly to represent the space field at a given position ak as represented in the below equation.

$${a}_{k}=\mathop{\sum }\limits_{i=1}^{N}{v}_{i}\cdot {c}_{i}$$
(1)

where \({v}_{i}\) are the vectors of the space characteristic fields and \({c}_{i}\) are the coefficients. When it comes to recreating missing values in the dataset, the EOF decomposition is essential. The first step in this approach is to use EOF to decompose the spatiotemporal variable field Amxn and produce spatial typical fields. The methodology then involves selecting the first N of these spatial fields and reconstructing the complete space-time variable field with these fields. Thus, the reconstructed matrix, represented as Amxn, is the product of the time coefficients corresponding to the first N spatial typical fields. This method is a reliable way to handle missing data in complicated datasets since it works especially well with big data volumes and is resilient to local changes. In geo-statistics, it is sometimes referred to as space-time modal decomposition and is a crucial component of the DINEOF method’s data analysis95.

DINCAE reconstruction method

The “DINCAE 2.0” methodology employ a convolutional neural network (CNN) created especially designed for reconstruction of missing data in satellite observations. The core of this methodology is a U-Net type network, which is particularly effective because of its deep structure and capacity to capture both local and global properties in the input data. A distinguishing characteristic of the U-Net design is the utilization of skip connections, which aid in maintaining the fine-scale details in the data and are essential for precise reconstruction of complicated geophysical fields36,98. Assuming a Gaussian distribution of errors, the network is trained to minimize the negative log-likelihood, which can be expressed mathematically as:

$$L(\theta )=-\log \,L(X|\theta )$$
(2)

Where L(θ) is the likelihood function of the parameters θ given the data X. This algorithm is further enhanced by an additional refinement phase, which processes the inputs and outputs through an auto-encoder, in addition to its architecture. This phase deepens the network and enhances the reconstruction quality, especially for complex patterns and interactions99,100. Moreover, the methodology expands to multivariate reconstruction, enabling it to handle diverse data types, such as wind fields, chlorophyll concentration, and sea surface temperature. For a thorough knowledge and analysis of environmental events, a multivariate approach is essential26,101,102. Additionally, a noteworthy development for this model is its ability to process non-gridded data, which is common in satellite datasets that vary in formats and resolutions. This flexibility is achieved by modifying the network’s cost function and the input layer, allowing it to efficiently analyze unstructured data inputs and produce organized gridded field outputs.

Figure 7 illustrates a global map collection that displays the monthly mean reconstructed atmospheric CO2 concentrations over six years, from January 2018 to July 2023. The XCO2 concentrations fluctuate throughout the year, exhibiting a seasonal pattern, with levels declining from May to September. This decrease can be attributed to increased photosynthetic activities in the Northern Hemisphere during the summer, which removes CO2 from the atmosphere. In contrast, XCO2 levels rise between October to April, driven by reduced photosynthetic activity in the winter in the Northern Hemisphere in conjunction with an increase in CO2 emissions from human activities like industrial processes and heating.

Fig. 7
figure 7

Spatiotemporal variation of monthly mean XCO2 from reconstructed data (1/2018 to 11/2023).

Global Spatiotemporal XCO2 evolution

Figure 8 provides a thorough analysis of variations in atmospheric CO2 in different parts of the world, highlighting the important contribution of oceans to Earth’s carbon cycle through their seasonal CO2 absorption and emission. It emphasizes the unique patterns of XCO2 exchange with the atmosphere in the Indian Ocean, the seasonal biological productivity and decay of temperate zones in the North Atlantic and North Pacific, and the continents of North America, Asia, and Europe. The waters in the Southern China Sea also show a naturally occurring cyclical constant increase with fluctuations of XCO2, almost similar trend and seasonal variation like China.

Fig. 8
figure 8

Spatiotemporal dynamics of XCO2, (a) using DINCAE reconstructed data to measure the variance in mean XCO2 values across regions, (b) Hovmöller diagram of the modeled XCO2 data by zonal latitude.

On the other hand, South America’s XCO2 trajectory, significantly influenced by the Amazon’s vast photosynthetic capacity, is marred by a concerning surge in emissions due to fossil fuel combustion and rampant deforestation. This trend is mirrored paralleled in the Southern Hemisphere regions such as the South Atlantic, South Pacific, and Oceania, where XCO2 levels, though lower, mirror the distressing upward trend observed in the northern counterparts. The depiction across these diverse landscapes contributes to a comprehensive narrative of the Earth’s carbon flux, illustrating a climate system that is both dynamic and acutely sensitive to changes, set against an ever-increasing concentration of atmospheric greenhouse gases. Notably, the graph identifies the Australian region as having the lowest column-average CO2 concentrations, marking a distinct region of interest in the global trend patterns.

The comprehensive analysis of the XCO2 linear growth rate spanning from 2018 to 2023 offers a striking visualization of the escalating CO2 concentrations worldwide, this is visible in Fig. 9. The intensity of the red hues mapped across various global regions not only underscores the areas with significant increases in XCO2 levels but also invites a deeper investigation into the underlying causes. These vivid markers serve as indicators of heightened industrial activities, urban sprawl, deforestation, and other carbon-emitting practices, highlighting the urgent need for scrutinizing local and regional environmental policies.

Fig. 9
figure 9

XCO2 growth rate behavior (based on linear trend) in Tropical and subtropical regions.

The Amazon Basin in South America, along with regions such as Argentina, Bolivia, Brazil, exhibits areas of elevated XCO2 levels, as indicated by the visual data. Southern region of Africa similarly shows an increase in XCO2 concentrations. These trends in the data reflect changes in atmospheric carbon dioxide but do not provide conclusive evidence about the specific causes of these increases, which could be influenced by a range of factors including natural variability caused by the landuse change, short-term atmospheric mixing, or regional land-use changes. In Asia, areas like South and west Asia, including western China, show noticeable XCO2 increases, possibly linked to industrial activity. Western region of Australia shows a subtler rise in XCO2 levels, which may be due to both natural processes and localized environmental conditions. However, caution is advised in interpreting these spatial trends as direct indicators of specific sources or processes without accounting for the potential influence of atmospheric dynamics over short periods.

Data Records

The final output of our study is a daily raster dataset, systematically organized in a folder structure and named following the pattern “Reconstructed_YYYYMMDD.tif”, representing gap-free daily XCO2 concentrations. Each data record is saved as a “.tif” file, ensuring broad compatibility and ease of use for further scientific analysis. The dataset covers an extensive period from January 2018 to November 2023 and boasts a fine spatial resolution of 0.1 degrees, facilitating precise, grid-based analyses of XCO2 distribution patterns over time. This dataset is publicly available for free download at the Zenodo103. Our data has been made publicly available for further academic inquiry and collaboration at the following digital repository: https://zenodo.org/records/13895409. We encourage scholars and researchers to access and utilize this dataset to advance the field.

Reconstructed products

A comparison between model and satellite data for XCO2 across the Western Hemisphere is shown in Fig. 10. A smoother gradient of XCO2 concentrations is seen across the final results of the integrated reconstruction of OCO-3, GOSAT, and CAMS, which incorporates all three datasets. This is likely due to the synergistic effect of combining all available data for a more accurate and higher-resolution reconstruction of the atmospheric CO2 distribution. Figure 10a OCO-3, (b) GOSAT L3, (c) CAMS, and (d) the reconstructed image using only OCO-3 data cannot obtain a complete image and still contain several missing gaps and even noise, so using only one dataset to reconstruct the data is not enough. On the other hand, Fig. 10e shows the integration of GOSAT information on data reconstruction, the resolution of GOSAT does not allow a satisfactory data reconstruction, so finally, we integrate the CAMS data, and the results are presented in Fig. 10f which are the final result of the reconstructed data. Hence, the images show how the model output (CAMS) and the original satellite observations (OCO-3 and GOSAT) differ in terms of data quality and resolution. The reconstructed and merged data maps show improvements in spatial resolution and data continuity, leading to image (f), which, as a result of the integration of all datasets, offers the most accurate and thorough perspective of XCO2 levels. This demonstrates the importance of merging data from several sources when conducting environmental monitoring and analysis.

Fig. 10
figure 10

Comparative Analysis of XCO2 Data (December 2020): (a) OCO-3, (b) GOSAT, (c) CAMS, (d) OCO-3 Reconstructed, (e) Combined OCO-3 and GOSAT Reconstructed, (f) Integrated Reconstruction of OCO-3, GOSAT, and CAMS.

Technical Validation

The key difference between GOSAT and CAMS lies in their spatial resolution and the data assimilation techniques used by CAMS, which integrates multiple data sources. These differences, far from being problematic, enhance the robustness of the reconstructed dataset by providing a broader spatial coverage and more frequent updates. The complementary nature of these datasets is critical in filling the gaps present in OCO-3 data, resulting in a more accurate and continuous global XCO2 dataset. This research aims to reconstruct the dataset using multisource geodata containing XCO2 information, which has been interpolated through the DINEOF and DINCAE methods. The validation process involves a comprehensive analysis of the differences between the interpolated OCO-3 data and reference datasets obtained from GOSAT and TCCON.

In evaluating the reliability of satellite-derived CO2 data, a crucial aspect involves comparing datasets from various instruments. Notably, studies104,105 have conducted comprehensive assessments of the consistency between different satellite missions, including BESD-SCIAMACHY, ACOS-GOSAT, and OCO-2. These assessments involve meticulous analyses of biases, correlation coefficients, and spatiotemporal variations, addressing challenges associated with differences in a priori profiles and averaging kernels across diverse satellite data products, as highlighted by Rodgers and Connor106. The results confirm a degree of agreement between various satellite datasets, facilitating the integration of multi-satellite observations for a more comprehensive understanding of atmospheric CO2 concentrations23.

In Fig. 11, the boxplot shows a succinct analytical comparison of atmospheric XCO2 across a suite of global monitoring TCCON stations, showcasing the spread and central tendencies within the TCCON, DINCAE, and DINEOF reconstructed datasets. The central clustering around 412 ppm highlights the coherence in global CO2 levels, with notable consistency at Burgos station, exemplifying minimal cross-method variability. Peaks are closely matched at Caltech and Armstrong Center, with the TCCON readings reaching a maximum of 427.565 ppm, indicative of regional emission spikes. A notable lower bound is observed at East Trout Lake with DINEOF, suggesting lower regional CO2 concentration levels. Lauder’s OCO-3 data exhibit good agreement with TCCON, reflecting reliable measurements. In contrast, sites like Orleans, Nicosia, and Paris display a tightly packed range, signifying data robustness. Reunion Island’s DINEOF results represent the lower end of measurements, with Xianghe maintaining a consistent range with the global data set, illustrating the extensive span of CO2 levels captured.

Fig. 11
figure 11

Boxplot analysis of XCO2 contrasting TCOON station measurements and reconstructed data methods.

Figure 12a shows a scatterplot comparison between XCO2 values from TCCON and those from the DINEOF technique. A high correlation coefficient (r = 0.94) between the two sets of data indicates a strong linear link, which is revealed by the scatter of dots. The difference between values seen and values projected by an estimator or model is measured by the root mean square error, or RMSE. Quantitative estimates of the average magnitude of the error are provided by the unbiased RMSE (ubRMSE) of 1.581 ppm and the RMSE of 1.610 ppm. With a bias of 0.301 ppm, which is another measure of precision, the DLR measurements are, on average, just marginally higher than the TCCON values.

Fig. 12
figure 12

Comparative Scatterplots, (a) DINEOF reconstructed data vs TCCON, (b) DINCAE reconstructed data.

Figure 12b presents a nuanced comparison, suggesting potential calibration or adjustment enhancements to the DINCAE method. This figure demonstrates a slightly improved correlation over Fig. 12a, with a correlation coefficient of 0.95. Furthermore, it features reduced RMSE and ubRMSE values of 1.441 ppm and 1.400 ppm, respectively, indicating more precise accuracy. While the bias slightly increases to 0.343 ppm, it remains within an acceptable margin, underscoring a closer alignment with the TCCON reference data. The lower RMSE and ubRMSE, alongside the higher correlation coefficient, suggest that the observations in Fig. 12b are more accurate and consistent compared to those aligned with the TCCON benchmark. Although the bias is marginally higher, it’s not significantly detrimental. The density of points closely following the 1:1 line visually confirms a tighter congruence with TCCON readings, hinting at successful calibration or adjustments in addressing systematic discrepancies evident in the DINEOF approach depicted in Fig. 12a.

The data validation process, incorporating scatterplot analysis and spatiotemporal patterns, is a robust methodology to assess the quality and reliability of reconstructed data. Comparing the measurements with TCCON datasets contributes to the refinement of monitoring CO2 concentrations.