The High-resolution Urban Meteorology for Impacts Dataset (HUMID) daily for the Conterminous United States

Newman, Andrew J.; Kalb, Christina; Chakraborty, TC; Fitch, Amy; Darrow, Lyndsey A.; Warren, Joshua L.; Strickland, Matthew J.; Holmes, Heather A.; Monaghan, Andrew J.; Chang, Howard H.

doi:10.1038/s41597-024-04086-2

Download PDF

Data Descriptor
Open access
Published: 04 December 2024

The High-resolution Urban Meteorology for Impacts Dataset (HUMID) daily for the Conterminous United States

Andrew J. Newman ORCID: orcid.org/0000-0001-8796-0861¹,
Christina Kalb¹,
TC Chakraborty ORCID: orcid.org/0000-0003-1338-3525^2,3,
Amy Fitch⁴,
Lyndsey A. Darrow⁴,
Joshua L. Warren ORCID: orcid.org/0000-0002-6274-6970⁵,
Matthew J. Strickland⁴,
Heather A. Holmes⁶,
Andrew J. Monaghan⁷ &
…
Howard H. Chang⁸

Scientific Data volume 11, Article number: 1321 (2024) Cite this article

2555 Accesses
6 Altmetric
Metrics details

Subjects

Abstract

Many current gridded surface meteorological datasets are inadequate for quantifying near-surface spatiotemporal variability because they do not fully represent the impacts of land surface heterogeneity. Of note, explicit representation of the spatial structure and magnitude of local urban warming are usually lacking. Here we enhance the representation of spatial meteorological variability over urban areas in the conterminous United States (CONUS) by employing the High-Resolution Land Data Assimilation System (HRLDAS), which accounts for the fine-scale impacts of spatiotemporally varying land surfaces on weather. We also synthesize in situ meteorological data including local mesonets to create a 1 km grid spacing model-observation fusion product spanning 1981–2018 over the CONUS at daily temporal resolution. Daily maximum, minimum, and mean values for a variety of temperature estimates, humidity, and surface energy budget terms, among others, are included. This High-resolution Urban Meteorology for Impacts Dataset (HUMID) will be useful for studies examining spatial variability of near surface meteorology and the impacts of urban heat islands across many disciplines including epidemiology, ecology, and climatology.

Global 1-km present and future hourly anthropogenic heat flux

Article Open access 22 February 2021

High-resolution grids of daily air temperature for Peru - the new PISCOt v1.2 dataset

Article Open access 01 December 2023

Reconstruction of all-sky daily air temperature datasets with high accuracy in China from 2003 to 2022

Article Open access 15 October 2024

Background & Summary

Surface gridded meteorology datasets provide spatiotemporally continuous estimates of variables such as temperature, humidity, wind, and precipitation over some domain of interest^{1,2,3,4,5,6,7}. They are a foundational element to many applications where weather or climate data on a grid is beneficial or required, such as distributed hydrologic modelling, numerical weather prediction forecast evaluation, and climate and health studies⁸. The process of moving from sparse, irregularly located stations with discontinuous temporal records to continuous surfaces has many technical challenges and methodological decision points^9,10. Thus, many different datasets are often available for the same domain. In many cases, higher spatiotemporal resolutions are desired in an effort to better resolve gradients, for example temperature gradients along mountain ranges or within urban areas^11,12,13. There are many 1–12 km spatial, sub-daily to daily temporal resolution products available over long time periods^{4,6,9,14,15,16} across the conterminous United States.

However, many of these current gridded meteorological datasets may be inadequate for quantifying spatiotemporal variation around urban areas because they do not fully represent the impacts of the urban environment. The aforementioned products over CONUS use spatial interpolation techniques that may smooth high-frequency spatial variation in temperature and include only natural landscape predictors to account for unresolved spatial variability (e.g., elevation, slope, and aspect). Additionally, higher density non-traditional observation networks such as local mesonetworks are usually not included in these data assimilation systems.

Products that fuse observations and model simulations offer a path forward to address the deficiencies in existing observation based meteorological datasets in representing spatial variability due to urban areas. Land-surface and urban canopy model development has advanced substantially over the past few decades such that land-surface models (LSMs) now include multi-layer snow and soil columns, explicit canopy models with improved vegetation transpiration, and solve the full water and energy balance equations to provide realistic estimates of surface weather conditions when driven with reasonable quality data¹⁷. Additionally, urban models can realistically represent multiple aspects of the urban canopy at various levels of complexity^18,19. Finally, land-surface and urban-models are now coupled into complete natural and urban land cover modelling systems^20,21.

Here we use the urban high-resolution land data assimilation system (u-HRLDAS²⁰,) to develop a model-observation fusion 1 km gridded meteorology dataset of daily near surface meteorology spanning 1 January 1981 to 31 December 2018 covering CONUS. A workflow schematic is given in Fig. 1. The u-HRLDAS combines the Noah-MP¹⁷ land-surface model and several urban models, the single-layer urban canopy model (SLUCM¹⁸,), along with input output (IO) routines for gridded meteorological forcing data, soils data, land-cover data, and model output, into one software system. We also perform a daily varying, locally weighted bias correction using station observations from a variety of observation networks in a post-processing step of the initial u-HRLDAS temperature estimates. No other variables are bias corrected in this dataset due to their relative lack of observations (e.g., humidity and wind measurements are very sparse).

This dataset, the High-resolution Urban Meteorology for Impacts Dataset (HUMID), includes a combination of mean, maximum, and minimum values (or all three) for a variety of fields including temperature (both raw and bias corrected), moisture, wind speed, surface energy budget terms, urban model temperatures (e.g., urban canopy temperature, building wall temperature, radiative temperature), skin temperature, among others. The complete variable list is available in the dataset documentation, available at the dataset repository (dataset DOI citation). Many applications within the geohealth, ecology, urban meteorology and climatology, and urban hydrology domains could benefit from HUMID, given the explicit representation of the modification of urban areas on the local energy balance and sensible weather (excluding precipitation).

Methods

Figure 1 highlights the key steps and datasets in our workflow, which span preprocessing input datasets, model simulation, and postprocessing (bias correction) u-HRLDAS output, and the following method discussion is grouped into three corresponding sections. Table 1 gives a summary of the static geophysical attributes, meteorological forcing data, and urban datasets used in this study, and Fig. 2 highlights the dataset domain.

Table 1 Input datasets to u-HRLDAS.

Full size table

u-HRLDAS input datasets

Static geophysical attributes

Our geophysical attributes: terrain height, land/water mask, soil texture, and natural vegetation characteristics, and are taken from the Weather Research and Forecasting (WRF) version 3.6 release geophysical data and processed using the WRF Preprocessing System (WPS) Geogrid executable version 3.6 (https://www2.mmm.ucar.edu/wrf/users/download/get_source.html). The Geogrid preprocessor takes terrain, vegetation, and soils data, and aggregates and/or interpolates it to a user specified grid spacing, in this case 1 km, using a variety of interpolation and aggregation routines²². Vegetation data includes a repeatable annual cycle at monthly time steps of leaf area index and green vegetation fraction. Derivatives of the terrain including slope and aspect are also computed.

Meteorological forcing data

We used the National Land Data Assimilation System Phase 2 (NLDAS-2) 0.125 degree (or approximately 12 km over CONUS) gridded hourly meteorology data available from the National Aeronautics and Space Administration (NASA)²³ as our meteorological forcing data. NLDAS-2 data include precipitation rate, air temperature, humidity, surface pressure, wind speed, and downward longwave and shortwave radiation, which are the seven required variables to run u-HRLDAS, as it contains energy and water balance models (see Methods subsection b). Precipitation data are derived from a combination of daily Climate Prediction Center (CPC) gridded precipitation analysis data, PRISM long-term climatologies to perform orographic adjustments, and temporal disaggregation using radar, satellite, or reanalysis data based on availability. All other fields are derived from the North American Regional Reanalysis and have various terrain and bias adjustments made⁶.

These data were then downscaled to the 1 km grid using a terrain based downscaling algorithm initially developed for the National Water Model (NWM, https://github.com/NCAR/WrfHydroForcing). The NWM algorithm performs a simple lapse rate adjustment to temperature (6.5 K km⁻¹) using the difference between the coarse and fine scale grid terrain heights, as well as slope and aspect adjustments for downward shortwave radiation. Humidity is adjusted to not exceed 100%, and surface pressure is recomputed using the hydrostatic assumption. No adjustments to precipitation, wind, and downward longwave radiation are made outside of interpolation to the target grid. The downscaled NLDAS-2 data are the input meteorological forcing data to u-HRLDAS (Fig. 1). An example of the coarse and downscaled NLDAS-2 2 m temperature data is given in Fig. 3 for 00 UTC 1 January 1991. Note the enhanced terrain features in both the western and eastern United States in Fig. 3b.

Urban class and fraction

We developed yearly urban land cover class and urban fraction on our 1 km grid based on the National Land Cover Database (NLCD) datasets from 2001, 2006, 2011, and 2016²⁴ to capture the first order changes in urban class and fraction across the 1981–2018 time period. For each 1 km grid cell, we counted the number of 30 m NLCD grid cells for low, medium, and high intensity urban classes and estimated the urban fraction as the number of grid cells divided by the total possible number of grid cells (roughly 1100 30 m grid cells within each 1 km grid cell). This process was performed by converting the 2001, 2006, 2011, and 2016 NLCD 30 m data to WRF WPS binary format using QGIS4WRF^25,26 open-source QGIS plug-in. This program converts the 30 m NLCD raster data to WRF WPS binary format. WRF WPS geogrid.exe then aggregates the 30 m data to the user selected grid, here our 1 km grid described in Table 2. Those geogrid output netCDF files were used to create geogrid netCDF files for all other years. For years between 2001, 2006, 2011, and 2016, yearly urban fractions were estimated using simple linear temporal interpolation between NLCD datasets. For years outside of NLCD datasets (1979–2000 and 2017–2018) we estimated a constant annual change in urban fraction as the slope of the line between the two temporally nearest NLCD based urban fraction estimates, and those trends were extrapolated forward or backward in time, e.g, 1981 trend estimates were based on the trend between the 2001 and 2006 NLCD datasets. The urban fraction (range of 0 to 1) for a given year outside of 2001–2016 for an urban class at each grid point was then calculated as:

$${U}_{F,Y}=min\left(max\left(\left[{U}_{F,N}+\left({Y-Y}_{N}\right)\ast T\right],0\right),1\right)$$

(1)

where U_F,Y is the estimated urban fraction for a single urban class for year Y, U_F,N is the temporally nearest NLCD estimate, Y_N is the year of the NLCD estimate, Y is the year being estimated (e.g., 1990), and T is the trend for that urban class between the nearest two NLCD estimates. For example, if the urban fraction for low intensity urban class for a grid cell was 0.25 in 2001 and 0.3 in 2006, then the annual change in urban fraction for the low intensity urban class was 0.01 per year between 2001 and 2006 for that grid cell. Thus, the estimate for 1991 is then $0.25+0.01\ast (\,-\,10)=0.15$.

Table 2 Grid specifications.

Full size table

We did not use the NLCD 1992 dataset as it has substantially different characteristics and the resultant urban class trend estimates between 1992 and 2001 were unrealistic. Because of the substantial extrapolation, particularly in the 1980’s, the urban class and fraction in that decade are likely more uncertain than other years, yet the spatial patterns and urban fraction amounts seem realistic (see Technical Validation section). The urban class for each year was set to the urban class with the highest fractional coverage (the mode of the urban classes) while the Urban ‘Developed, open space’ class was set to grassland. 1980 and 2018 urban fractions as well as the 2018–1980 difference for the Houston, Galveston, and Beaumont-Port Arthur metropolitan areas is shown in Fig. 4. This highlights the reasonable representation of extensive urban expansion with little to no change in urban fraction within pre-existing highly urbanized areas.

u-HRLDAS Simulations

The u-HRLDAS is a software system that contains column land-surface and urban canopy models with all the necessary ancillary routines to read in appropriate static and time varying surface datasets such as soil texture data, land use and land cover, leaf area index, and green fraction. Additionally, u-HRLDAS incorporates meteorological observations to drive the land-surface model (LSM). These include precipitation, air temperature, short and longwave incoming radiation, surface pressure, humidity, and wind²⁰. Meteorological forcing data (NLDAS-2 here) may come from any user defined source and needs to be preprocessed onto the target u-HRLDAS grid. u-HRLDAS has a primary compatibility with the Weather Research and Forecasting (WRF) modeling system, thus many of the preprocessing routines and surface related datasets can be directly used from WRF as discussed above.

Noah-MP

The Noah-Multiparameterization options (Noah-MP) LSM¹⁷ is the current LSM within u-HRLDAS and is coupled to the urban models available within u-HRLDAS²⁷. Noah-MP is a state-of-the-science LSM that enables modeling the energy and mass balances of the land-atmosphere interface. Process representations within Noah-MP include separate vegetation canopy and bare ground components, a multi-layer snow model, and multi-layer soils with several options for surface and subsurface processes (e.g., surface water infiltration, subsurface drainage, frozen soils). The model has been shown to perform well in reproducing the energy and mass fluxes of the land-atmosphere interface as well as near-surface sensible weather (e.g. 2 m temperature and specific humidity) for a variety of applications^{28,29,30,31,32}.

Single layer urban canopy model

We use the Single Layer Urban Canopy Model (SLUCM¹⁸,) within u-HRLDAS. The SLUCM has been shown to replicate the temporal thermal dynamics of the urban canopy in previous studies^18,19 while being simpler than multilayer urban canopy models, through using only a single layer of simplified urban geometry to represent the urban canopy. Energy and moisture fluxes are computed within the SLUCM, as well as time varying building shadows using solar azimuth, multilayer heat transfer in buildings and roads, and reflection of both short and longwave radiation in the urban canopy^18,33. SLUCM parameters are specified through an input table that is held constant across the CONUS domain. Spatially constant urban parameters do not represent the true spatial variability in urban characteristics (e.g., Phoenix is developed differently than New York), but provides a baseline for future improvements. A temperature bias correction step (Methods Section c) should reintroduce at least some spatial variability across urban areas that could be expected from variable SLUCM parameters.

Simulation details

u-HRLDAS was run using an hourly timestep from 00 UTC 1 January 1979 to 23 UTC 31 December 2018 over CONUS at 1 km horizontal grid spacing. The first two years (1979 and 1980) were used as model spin-up and not included in the final dataset. Spin-up is required because NLDAS includes modeled land states (e.g., soil temperature) from a slightly different LSM, thus they are initially inconsistent with Noah-MP and require spin-up³⁴. The default SLUCM urban and Noah-MP soil property and vegetation parameters were used from the u-HRLDAS parameter tables provided in the u-HRLDAS code. The initial model grid is 4608 by 3840 km on a Mercator projection and shown in Fig. 2. Grid specifications are given in Table 2 and are available in the metadata of the release dataset. The base model grid is substantially larger than CONUS as this specific grid is used across multiple projects. An example of the u-HRLDAS Tmax for 4 July 2012 is shown in Fig. 5.

Temperature bias correction

After the u-HRLDAS simulation is complete, we perform a bias correction to maximum and minimum daily temperatures independently using all available station data that passes automated quality control checks in the Global Historical Climatology Network daily (GHCNd)^35,36 and the Meteorological Assimilation Data Ingest System (MADIS, https://madis.noaa.gov). A general workflow schematic can be found in Fig. 6. GHCNd data are provided as daily maximum (Tmax) and minimum (Tmin) temperature values. HUMID is based on UTC day, while GHCNd stations across CONUS typically report using a schedule using local time. If a GHCNd station reports in the local morning (e.g. 7 am, or roughly 12 UTC over CONUS), we assume the reported Tmax is for the previous day and the reported Tmin is for the current day following the literature⁵. This can result in some mismatch of estimated Tmax and Tmin timing due to incorrect station reporting times, or atypical diurnal temperature cycles, and is a source of uncertainty when using GHCHd observations. We use the surface mesonet station data within MADIS, which are automated surface weather stations that report at hourly or more frequent intervals, and compute daily Tmax and Tmin directly from those values. The number of stations available across 1981–2018 varies widely, with a notable increase starting in 2001 when the MADIS mesonetwork data became available (Fig. 7). The variations in available surface station data will impart temporal variability on the characteristics of the bias correction and the released dataset (see Technical Validation Section). Note that MADIS data require an account to obtain data, with more information being found at MADIS website (https://madis.noaa.gov).

For each month, we catalogue all stations that have at least one day of valid Tmax or Tmin observations and first develop a station-grid correspondence data structure that includes the grid point index (i,j), grid and station elevations, and land cover at the station grid cell. Then for each day and grid point with a station location, we find up to 20 valid stations within 500 km of that grid point and estimate the temperature bias for the corresponding general land cover type (non-urban or urban). Valid station observations are adjusted to the grid box elevation using the mean lapse rate of all station-grid combinations and the specific station-grid point elevation difference. Use of the mean lapse rate reduces noise in the estimated lapse rate due to measurement uncertainty. If that estimated lapse rate is less than −12 K km⁻¹, it is set to −12 K km⁻¹ which is a slightly superadiabatic lapse rate. If there is for some reason a missing or infinite lapse rate, a default of −6 K km⁻¹ is used. There is no check for maximum positive lapse rates (areas of temperature inversions).

The temperature bias is then estimated by using a 3 × 3 grid box (9 km²) average of the HRLDAS output and the distance weighted temperature average of all valid stations with the same land cover classification (non-urban or urban) as the target grid point. The distance weighting function is given as

$${w}_{i}={\left[1-{\left(\frac{{d}_{i}}{{MAXD}}\right)}^{3}\right]}^{3}$$

(2)

where w_i is the weight of the current station i, d_i is the distance to that station, and MAXD is a constant set to 501 km for this study, resulting in urban or non-urban HRLDAS bias estimates at all station locations. The MAXD value was set in an ad-hoc manner to enable inclusion of at least one other station for the same land cover type for the bias estimate. Then, the urban and non-urban bias estimates are interpolated from the station locations to all grid points using the scatteredInterpolant function in Matlab (Matlab 9.8.0.1380330 [R2020a] Update 2). Urban and non-urban bias estimates are weighted by urban fraction for each grid cell and added to Tmax and Tmin to produce the final bias corrected temperature estimates for each grid cell and day. We do not attempt to bias correct other fields in the dataset (humidity, radiation) given the sparse nature of available in situ observations for those fields. We also do not correct precipitation as the input NLDAS precipitation is generally considered a ‘ground-truth’ product, and u-HRLDAS does not modify input precipitation.

Data Records

The data are provided in netCDF4 format (https://www.unidata.ucar.edu/software/netcdf/) files using lossless compression available within netCDF4 using lossless compression within the netCDF Operators (NCO) version 4.7.9³⁷. Each file contains all archived u-HRLDAS variables for all grid points for one UTC Day as a three-dimensional array, coordinate variables (latitude, longitude, date), and geophysical attributes. Leap years include 29 February. File names are specified using a: conus_HUMID_YYYYMMDD.nc4 naming convention where YYYY is the 4-digit year, MM is the 2-digit month, and DD is the two-digit day. Supplement 1 gives the full variable list with corresponding file and variable metadata. The data are archived at https://doi.org/10.5065/JF2T-6F61, which includes all meteorology data files, variable names and metadata, and the u-HRLDAS namelist. The data are available at the NSF NCAR Research Data Archive at https://rda.ucar.edu/datasets/d314008³⁸.

Technical Validation

To assess the performance of the dataset we have performed a set of comparisons to existing datasets, focusing on the bias corrected 2 m air temperature (Tmax and Tmin) as we have the highest confidence in these two variables given the extensive set of observations used in the bias correction. As noted earlier, air temperature is the most widely measured variable and it is easier to model its spatial variations compared to other variables. Therefore, it is the most readily available and reliable variable to compare across other gridded datasets. Other variables present additional challenges. As an example for near-surface moisture, for the three datasets we use for validation (Table 3), 1 (PRISM) has interpolated daily mean dew point temperature (Td), Daymet has only daily average partial pressure of water vapor, and the Zhang et al. (2022) dataset³⁹ is only for air temperature. Furthermore, the other variables in our dataset come directly from NLDAS (e.g. downward radiation, precipitation, wind), are more difficult to verify (e.g. latent and sensible heat fluxes) or are entirely modeled quantities (e.g. urban canopy temperature). Thus, exploration of variables outside of air temperature and dew point temperature and their potential characteristics is left for future work.

Table 3 Characteristics of the reference datasets used in our temperature validation.

Full size table

We performed comparisons against three gridded observation-based temperature datasets, Daymet version 3⁴⁰, the freely available 4 km grid spacing version of the daily best-estimate PRISM product (AN81d)^4,41, and the dataset of Zhang et al.³⁹. We also compare PRISM Td to the HUMID dataset. Daymet and PRISM use in situ station observations and geophysical attribute (e.g. elevation) informed interpolation to produce estimates of daily Tmax and Tmin. Zhang et al.³⁹ uses in situ observations within an interpolation model informed by elevation and satellite estimated land surface temperature. We made full grid spatiotemporal comparisons between our dataset, Daymet, and PRISM, by regridding our dataset and Daymet to the PRISM grid for the comparisons. The same regirding procedure was used for Td. Urban-rural zone comparisons were made using the Zhang et al.³⁹ dataset. Table 3 summarizes basic characteristics of the reference datasets.

Air Temperature comparisons

First, we discuss CONUS-wide comparisons between HUMID, Daymet, and PRISM. The long-term (1981–2017) mean daily Tmax and Tmin for the three datasets and their differences are displayed in Figs. 8–10. All three datasets represent the predominant shaping features of climate across CONUS such as elevation, latitude, and proximity to large bodies of water for Tmax and Tmin. Further, large urban heat islands are visible in HUMID and these urban areas are more visible when examining the differences of HUMID-Daymet and HUMID-PRISM (Figs. 8, 9). Overall, HUMID is more similar to Daymet than PRISM and Daymet and PRISM are more similar to each other than to HUMID, which may be expected given the larger methodological differences between HUMID to Daymet and PRISM (Figs. 8d–f, 9d–f).

Figure 10 provides difference histograms and includes the difference from the uncorrected HUMID Tmax and Tmin values, to highlight the impact of our temperature bias correction to correct the known high temperature biases in u-HRLDAS using the SLUCM^19,42. The large positive differences between HUMID to Daymet and PRISM are significantly decreased after bias correction for Tmax and more notably for Tmin. Furthermore, there is a much larger percentage of grid cells with long-term average differences within ± 2.5 °C after bias correction (Fig. 10a–d). The bias corrected HUMID temperature difference distribution is more similar to the PRISM-Daymet difference distribution than the raw HUMID temperature differences.

Two other noteworthy features are visible from Figs. 8, 9 and Figures S1–S8 that may require further investigation of our type of methodological application for large-scale temperature modeling. First, there is enhanced nighttime agreement across the datasets in urban locations, as seen through the lack of notable positive differences between HUMID and the other two datasets in Fig. 9d-e as compared to Fig. 8d-e. This could be from several reasons that should have further inquiry. It could be a product of our modelling system, where the nighttime differences after bias correction are somehow more similar to the current products, or some emergent property of the reporting networks and the interpolation schemes. Likely, both of these options relate to the spatial variability in Tmin. Summer Tmin urban heat island differences are somewhat apparent in Figure S7, while not obvious in DJF. This could suggest the spatial variability in Tmin and Tmax have different inherent observability in the Daymet and PRISM observation networks. Second, there are significant differences between HUMID and both Daymet and PRISM for Tmin across the Intermountain Western US. This likely stems from the use of the simple downscaling of 12 km NLDAS meteorology, which is then input to the Noah-MP LSM. This methodology does not resolve nighttime inversions in the same manner as a direct interpolation with elevation corrections would, even with our bias correction as our bias correction does not have additional elevation adjustments outside of those described in the Temperature Bias Correction section and is thus more limited to the observation network density.

Seasonal summer June-August (JJA) and winter December-February (DJF) differences are shown in the supplemental material. Figures S1–S8 step through Tmax and Tmin figures similar to those in Figs. 8–10. The seasonal analysis highlights that the largest positive differences between the raw HUMID and the other gridded datasets happens in summer, when the urban model simulates the highest temperatures. Otherwise, the seasonal characteristics of HUMID are similar to the annual discussed here.

Finally, the comparisons with Zhang et al. (2022), hereafter Z-2022, were made for urban clusters and their corresponding rural or background reference areas, which are commonly used for urban heat island calculations⁴³, for 2009–2018 averages. This was done because the Z-2022 methodology allows for at least partial representation of urban heat islands and provides another reference comparison to the HUMID urban heat island characterizations. We defined these zones using our 2009 1 km urban land fraction dataset to identify conterminous areas of grid cells greater than 30% urban. Rural areas were defined using a buffer around urban zones with a variable width per urban zone, such that the urban and rural areas are equal for each urban-rural pair. From there, only urban clusters (and their rural reference) with areas larger than 10 km² were included (10 grid points in each dataset). Each urban-rural pair was then internally masked to include only urban pixels with urban fractions greater than 0.5, and only rural pixels with urban fractions less than 0.2 were included to compute the average urban and rural temperatures for each cluster from all datasets. This results in 1752 urban-rural pairs for comparison across CONUS. For completeness, we also perform the same urban-rural scale comparison using Daymet.

HUMID has very similar urban characteristics to Z-2022 for both Tmax and Tmin as shown in Fig. 11. HUMID shows a strong linear correlation with Z-2022, with an r² of 0.98 and 0.99 for urban Tmax and Tmin respectively, and very little conditional difference (regression slopes near 1 for both Tmax and Tmin). Further, root mean square error (RMSE) and the mean bias error (MBE) are less than 2 °C for both urban Tmax and Tmin. HUMID compares similarly to Daymet for correlation, slope and RMSE, but with slightly higher urban temperatures (higher positive bias) than Z-2022, which is expected given the input data in Daymet not properly resolving urban areas. The rural reference comparisons (Fig. 12) show higher agreement between HUMID and both Daymet and Z-2022 than those for urban clusters (Fig. 11 versus 12). For both figures, the mean percentage error (MPE) using degree C is also shown to provide a relative measure of bias that can be compared across cases.

Finally, Chakraborty et al.⁴² has already used the temperature and humidity fields from the last 5 years (2014–2018) of this dataset to characterize heat stress differences across and within urban areas. They performed verification of the modeled urban skin temperature using MODIS and found good correlation between the two datasets (their Figure S9), even with all the caveats of comparing MODIS skin temperatures to a modeled value. Their analysis highlights that HUMID has expected covariance of temperature and humidity across and within urban areas.

Dew point temperature comparisons

We also more rigorously compare water vapor in HUMID than in⁴² by comparing to PRISM. We convert HUMD specific humidity to Td on the PRISM grid. Figure 13 highlights the 1981–2017 daily average Td from the two datasets and the differences. HUMID has a systematically higher Td than PRISM over most of the domain. Larger differences are present in the eastern US as well as across the intermountain west. Figure 14 displays the HUMID – PRISM difference histogram where the systematic difference of 1–2 K is clear. This difference could be from a variety of reasons related to the input data and methodologies. PRISM directly interpolates sparse Td observations to their grid using the PRISM methodology⁴⁴. HUMID uses NLDAS 12 km moisture and surface pressure to force the land-surface model. The coarser grid could explain some of the spatial variability differences in the intermountain west. The overall positive offset in HUMID could stem from the input NLDAS data, or the diagnosis of 2 m specific humidity in the land model results in higher near surface water vapor than PRISM estimates. Seasonal Td differences (Figures S9–S12) are very similar to the annual differences shown here.

Summary

To summarize, comparisons of HUMID to other state-of-the-science observation-based gridded meteorology datasets shows that HUMID has physically and observationally consistent patterns of temperature and humidity. The differences in air temperature across datasets are similar and in the range of gridded dataset uncertainties (e.g.⁹) with expected differences in HUMID for urban areas given the design of HUMID. Dew point temperature appears to have a systematic difference from PRISM of 1–2 K, which could be from a variety of reasons, which will require further investigation and potential correction. An initial application of HUMID examining urban heat disparities highlights the potential usefulness of this dataset⁴². Note that the underlying NLDAS data was not developed with observed trend preservation in mind. Our use of observations in the bias correction have also not been used in a manner that explicitly preserves trends. Thus, using this dataset for trend evaluation should be done carefully with initial data exploration to evaluate validity for the user’s application.

It is expected that HUMID will be updated in the future. In those future updates we plan to examine several possible avenues for improvement or sensitivity and uncertainty analysis. This includes better accounting for observed trends. Further, the use of alternative land cover datasets, such as the local climate zones (LCZs)⁴⁵, or the historical settlement data across the US⁴⁶ open sensitivity testing for land cover uncertainty or further historical modeling. Additionally, we may explore alternative modelling systems, such as the other urban model options within u-HRLDAS, or even alternative modelling systems such as UrbClim⁴⁷, which approaches the problem from a different perspective than u-HRLDAS.

Usage Notes

The data can be read using any software that is able to link to netcdf4 libraries. This includes Python, R, Matlab, Fortran, C/C++, and others. See https://www.unidata.ucar.edu/software/netcdf/ for more information regarding netCDF. For quick file information, the netCDF utility, ncdump (NetCDF users Guide v1.1) generates text for the netCDF file. The netCDF Operators (NCO) provide many stand-alone command line utilities for netCDF file manipulation^48,49, and finally the ncview utility provides quick look visualization of netCDF files (https://cirrus.ucsd.edu/ncview/). Many examples of netCDF analysis are available via your web search engine of choice.

Code availability

The specific version of HRLDAS used, including our source code modifications, as well as the corresponding analysis and post-processing scripts, including for the figures in this paper are available at https://doi.org/10.5281/zenodo.10825221. Figures were generated using Matlab 9.8.0.1380330 [R2020a] Update 2, NCL version 6.6.2, and Python version 3.

References

Harris, I., Osborn, T. J., Jones, P. & Lister, D. Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset. Sci Data 7, 109, https://doi.org/10.1038/s41597-020-0453-3 (2020).
Article PubMed PubMed Central Google Scholar
Matsuura, K., & Willmott, C. University of Delaware Terrestrial Precipitation. NOAA ESRL, https://www.esrl.noaa.gov/psd/data/gridded/data.UDel_AirT_Precip.html (2018).
Daly, C., Neilson, R. P. & Phillips, D. L. A statistical-topographical model for mapping climatological precipitation over mountainous Terrain. J. Appl. Meteor. 33, 140–158 (1994).
Article ADS Google Scholar
Daly, C. et al. Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States. Intl. J. Climatol. 28, 2031–2064, https://doi.org/10.1002/joc.1688 (2008).
Article Google Scholar
Maurer, E. P., Wood, A. W., Adam, J. C., Lettenmaier, D. P. & Nijssen, B. A long-term hydrologically based dataset of land surface fluxes and states for the conterminous United States. J. Climate 15, 3237–3251, https://doi.org/10.1175/1520-0442(2002)015 (2002).
Article ADS Google Scholar
Xia, Y. et al. Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. J. Geophys. Res. 117, D03109, https://doi.org/10.1029/2011JD016048 (2012).
Article ADS Google Scholar
Cornes, R., van der Schrier, G., van den Besselaar, E. J. M. & Jones, P. D. An ensemble version of the E-OBS temperature and precipitation data sets. J. Geophys. Res: Atmos. 123, 9391–9409 (2018).
Article ADS Google Scholar
Manware, M., Dubrow, R., Carrión, D., Ma, Y. & Cheng, K. Residential and race/ethnicity disparities in heat vulnerability in the United States. GeoHealth 6(12), e2022GH000695 (2022).
Article PubMed PubMed Central Google Scholar
Newman, A. J. et al. Gridded ensemble precipitation and temperature estimates for the conterminous United States. J. Hydrometeor. 16, 2481–2500, https://doi.org/10.1175/JHM-D-15-0026.1 (2015).
Article ADS Google Scholar
Newman, A. J., Clark, M. P., Longman, R. J. & Giambelluca, T. W. Methodological inter-comparisons of station-based gridded meteorological products: utility, limitations, and paths forward. J. Hydrometeor. 20, 531–547, https://doi.org/10.1175/JHM-D-18-0114.1 (2019).
Article ADS Google Scholar
Voelkel, J. & Shandas, V. Towards systematic prediction of urban heat islands: grounding measurements, assessing modelling techniques. Climate 5(2), 41 (2017).
Article Google Scholar
Li, X., Zhou, Y., Asrar, G. R. & Zhu, Z. Developing a 1 km resolution daily air temperature dataset for urban and surrounding areas in the conterminous United States. Remote Sensing of the Environment 215, 74–84 (2018).
Article ADS Google Scholar
Chakraborty, T., Venter, Z. S., Qian, Y. & Lee, X. Lower urban humidity moderates outdoor heat stress. AGU Advances 3(5), e2022AV000729 (2022).
Article ADS Google Scholar
Livneh, B. et al. A spatially comprehensive, hydrometeorological data set for Mexico, the US, and Southern Canada 1950-2013. Scientific Data 2(1), 1–12 (2015).
Article Google Scholar
Thornton, M.M., et al Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4 R1. ORNL DAAC, Oak Ridge, Tennessee, USA. https://doi.org/10.3334/ORNLDAAC/2129 (2022).
National Weather Service (NWS). Analysis of record for calibration: version 1.1 - Sources, methods, and verification. Office of Water Prediction, Silver Spring, MD, https://hydrology.nws.noaa.gov/aorc-historic/Documents/AORC-Version1.1-SourcesMethodsandVerifications.pdf (2021).
Niu, G. Y. et al. The community Noah land surface model with multiparameterization options (Noah-MP): 1. Model description and evaluation with local-scale measurements. J. Geophys. Res. Atmos, 116(D12), https://doi.org/10.1029/2010JD015139 (2011).
Kusaka, H., Kondo, H., Kikegawa, Y. & Kimura, F. A simple single-layer urban canopy model for atmospheric models: Comparison with multi-layer and slab models. Boundary Layer Meteorol. 101, 329–358 (2001).
Article ADS Google Scholar
Monaghan, A. J., Hu, L., Brunsell, N. A., Barlage, M. & Wilhelmi, O. V. Evaluating the impact of urban morphology configurations on the accuracy of urban canopy model temperature simulations with MODIS. J. Geophys. Res. Atmos. 119, 6376–6392, https://doi.org/10.1002/2013JD021227 (2014).
Article ADS Google Scholar
Chen, F. et al. Description and evaluation of the characteristics of the NCAR high-resolution land data assimilation system. J. Appl. Meteor. 46, 694–713 (2007).
Article Google Scholar
Oleson, K. W., Bonan, G. B., Feddema, J., Vertenstein, M. & Grimmond, C. S. B. An urban parameterization for a global climate model. 1. Formulation and evaluation for two cities. J. Appl. Meteor. Clim. 47, 1038–1060 (2008).
Article ADS Google Scholar
Wang, W. et al. Weather Research & Forecasting ARW version 3 modeling system user’s guide. NCAR https://www2.mmm.ucar.edu/wrf/users/docs/user_guide_V3/user_guide_V3.6/ARWUsersGuideV3.6.1.pdf (2015).
Xia, Y. et al. NLDAS primary forcing data L4 hourly 0.125 × 0.125 degree V002, NCEP/EMC, Edited by David Mocko, NASA/GSFC/HSL, Greenbelt Maryland, USA, Goddard Earth Sciences Data and Information Services Center (GES DISC), Accessed: 07 April 2020, https://doi.org/10.5067/6J5LHHOHZHN4 (2009).
Yang, L. et al. A new generation of the United States National Land Cover Database: Requirements, research priorities, design, and implementation strategies. ISPRS journal of photogrammetry and remote sensing 146, 108–123 (2018).
Article ADS Google Scholar
Meyer, D. & Riechert, M. Open source QGIS toolkit for the Advanced Research WRF modelling system. Environmental Modelling & Software 112, 166–178, https://doi.org/10.1016/j.envsoft.2018.10.018 (2019).
Article CAS Google Scholar
Meyer, D. & Riechert, M. The GIS4WRF Plugin. Zenodo. https://doi.org/10.5281/zenodo.1288569 (2018).
Salamanca, F. et al. Evaluation of the WRF-urban modelling system coupled to Noah and Noah-MP land surface models over a semiarid urban environment. J. Geophys. Res. Atmos. 123(5), 2387–2404 (2018).
Article ADS Google Scholar
Yang, Z.-L. et al. The community Noah land surface model with multiparameterization options (Noah-MP): 2. Evaluation over global river basins. J. Geophys. Res. Atmos. 116(D12). (2011).
Cai, X., Yang, Z.-L., David, C. H., Niu, G. Y. & Rodell, M. Hydrological evaluation of the Noah-MP land surface model for the Mississippi River basin. J. Geophys. Res. Atmos. 119(1), 23–38 (2014).
Article ADS Google Scholar
Barlage, M. et al. The effect of groundwater interaction in North America regional climate simulations with WRF/Noah-MP. Climatic Change 129, 485–498 (2015).
Article ADS Google Scholar
Ma, N. et al. A systematic evaluation of Noah-MP in simulating land-atmosphere energy, water, and carbon exchanges over the continental United States. J. Geophys. Res. Atmos. 122(22), 12–245 (2017).
Article Google Scholar
Liu, C. et al. Continental-scale convection-permitting modelling of the current and future climate of North America. Climate Dynamics 49, 71–95 (2017).
Article ADS Google Scholar
Kusaka, H. & Kimura, F. Coupling a single-layer urban canopy model with a simple atmospheric model: Impact on urban heat island simulation for an idealized case. J. Meteorol. Soc. Jpn. 82, 67–80 (2004).
Article ADS Google Scholar
Schlosser, C. A. et al. Simulations of a boreal grassland hydrology at Valdai, Russia: PILPS phase 2(d). Mon. Wea. Rev. 128, 301–321 (2000).
Article ADS Google Scholar
Menne, M. J., Durre, I., Vose, R. S., Gleason, B. E. & Houston, T. G. An overview of the global historical climatology network-daily database. J. Atmos. Oceanic Technol. 29, 897–910 (2012).
Article ADS Google Scholar
Menne, M. J. et al. Global historical climatology network – daily (GHCN-Daily), Version 3. Version 3.27. NOAA National Climatic Data Center. https://doi.org/10.7289/V5D21VHZ accessed on 22 August (2019).
Zender, C. S. netCDF Operator (NCO) User Guide, http://nco.sf.net/nco.pdf (2022).
Newman, A. et al. The High-resolution Urban Meteorology for Impacts Dataset - HUMID. Research Data Archive at the National Center for Atmospheric Research. Computational and Information Systems Laboratory. https://doi.org/10.5065/JF2T-6F61 (2024).
Article Google Scholar
Zhang, T. et al. A global dataset of daily maximum and minimum near-surface air temperature at 1 km resolution over land (2003-2020). Earth Syst. Sci. Data 14, 5637–5649, https://doi.org/10.5194/essd-14-5637-2022 (2022).
Article ADS Google Scholar
Zhang, T. & Zhou, Y. A global 1 km resolution daily near-surface air temperature dataset (2003–2020). Iowa State University. Collection. https://doi.org/10.25380/iastate.c.6005185.v1 (2022).
Thornton, P. E. et al. Daymet: daily surface weather data on a 1-km grid for North America, version 3. Time period: 1980-01-01 to 2017-12-31, Spatial Range: Full North America, ORNL DAAC, https://doi.org/10.3334/ORNLDAAC/1328 accessed 23 August (2018).
Chakraborty, T., Newman, A. J., Qian, Y., Hsu, A. & Sheriff, G. Residential segregation and outdoor urban moist heat stress disparities in the United States. One Earth 6(6), 738–750 (2023).
Article ADS Google Scholar
Yang, Q. et al. An adaptive synchronous extraction (ASE) method for estimating intensity and footprint of surface urban heat islands: A case study of 254 North American cities. Remote Sensing of Environment 297, 113777 (2023).
Article Google Scholar
PRISM group, Descriptions of PRISM spatial climate datasets for the Conterminous United States. PRISM Climate Group at Oregon State University, https://prism.oregonstate.edu/documents/PRISM_datasets.pdf (2022).
Demuzere, M. et al. Combining expert and crowd-sourced training data to map urban form and functions for the continental US. Sci Data 7, 264, https://doi.org/10.1038/s41597-020-00605-z (2020).
Article PubMed PubMed Central Google Scholar
Leyk, S. & Uhl, J. HISDAC-US, historical settlement data compilation for the conterminous United States over 200 years. Sci Data 5, 180175, https://doi.org/10.1038/sdata.2018.175 (2018).
Article PubMed PubMed Central Google Scholar
De Ridder, K., Lauwaet, D. & Maiheu, B. UrbClim–A fast urban boundary layer climate model. Urban Climate 12, 21–48 (2015).
Article Google Scholar
Zender, C. S. Analysis of self-desribing gridded geoscience data with netCDF Operators (NCO). Environ. Modell. Softw. 23(10), 1338–1342, https://doi.org/10.1016/j.envsoft.2008.03.004 (2008).
Article Google Scholar
Zender, C. S. netCDF Operator (NCO) User Guide, http://nco.sf.net/nco.pdf (2023).
Brand, A., Allen, L., Altman, M., Hlava, M. & Scott, J. Beyond authorship: attribution, contribution, collaboration and credit. Learned Publishing 28(2), 151–155 (2015).
Article Google Scholar
Miller, D. A. & White, R. A. A conterminous United States multilayer soil characteristics dataset for regional climate and hydrology modelling. Earth Interact., 2(2), 25.
Thornton, P. E., Running, S. W. & White, M. A. Generating surfaces of daily meteorological variables over large regions of complex terrain. J. Hydrology 190, 214–251, https://doi.org/10.1016/S0022-1694(96)03128-9 (1997).
Article ADS Google Scholar
PRISM Climate Group, Oregon State University, https://prism.oregonstate.edu, data created 13 June 2019, last accessed 17 March 2022 (2022).

Download references

Acknowledgements

This project was supported by the National Institute of Environmental Health Sciences (NIEHS) grant number R01ES028346. This material is also based upon work supported by the NSF National Center for Atmospheric Research, which is a major facility sponsored by the U.S. National Science Foundation under Cooperative Agreement No. 1852977. TC’s contribution was supported by the Coastal Observations, Mechanisms, and Predictions Across Systems and Scales-Great Lakes Modeling (COMPASS-GLM), Integrated Coastal Modeling (ICoM), and a U.S. Department of Energy (DOE) Office of Science Early Career project. Pacific Northwest National Laboratory is operated for DOE by Battelle Memorial Institute under contract DE-AC05-76RL01830. The authors would like to thank Yongxing Zhang of NSF NCAR for his help with processing NLDAS input data and Michael Barlage of NOAA for his help with the HRLDAS model. Finally, input from two anonymous reviewers and the Scientific Data editorial team have greatly improved this manuscript.

Author information

Authors and Affiliations

NSF National Center for Atmospheric Research, Boulder, CO, USA
Andrew J. Newman & Christina Kalb
Yale University, School of the Environment, New Haven, CT, USA
TC Chakraborty
Atmospheric, Climate, & Earth Sciences Division, Pacific Northwest National Laboratory, Richland, WA, USA
TC Chakraborty
University of Nevada, Reno, School of Public Health, Reno, NV, USA
Amy Fitch, Lyndsey A. Darrow & Matthew J. Strickland
Yale School of Public Health, Department of Biostatistics, New Haven, CT, USA
Joshua L. Warren
University of Utah, Department of Chemical Engineering, Salt Lake City, UT, USA
Heather A. Holmes
University of Colorado, Boulder, CO, USA
Andrew J. Monaghan
Emory University, Rollins School of Public Health, Atlanta, GA, USA
Howard H. Chang

Authors

Andrew J. Newman
View author publications
You can also search for this author inPubMed Google Scholar
Christina Kalb
View author publications
You can also search for this author inPubMed Google Scholar
TC Chakraborty
View author publications
You can also search for this author inPubMed Google Scholar
Amy Fitch
View author publications
You can also search for this author inPubMed Google Scholar
Lyndsey A. Darrow
View author publications
You can also search for this author inPubMed Google Scholar
Joshua L. Warren
View author publications
You can also search for this author inPubMed Google Scholar
Matthew J. Strickland
View author publications
You can also search for this author inPubMed Google Scholar
Heather A. Holmes
View author publications
You can also search for this author inPubMed Google Scholar
Andrew J. Monaghan
View author publications
You can also search for this author inPubMed Google Scholar
Howard H. Chang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

We follow the CRediT contributor role taxonomy from⁵⁰ to define our author contributions. A.J.N.: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Resources, Data Curation, Writing - Original Draft, Writing - Review & Editing, Visualization, Supervision, Project administration, Funding acquisition. C.K.: Validation, Formal analysis, Data Curation, Writing - Review & Editing, Visualization. T.C.: Validation, Formal analysis, Writing - Review & Editing, Visualization. A.F.: Validation, Writing - Review & Editing. L.A.D.: Conceptualization, Writing - Review & Editing, Supervision, Project administration, Funding acquisition. J.L.W.: Conceptualization, Writing - Review & Editing, Funding acquisition. M.J.S.: Conceptualization, Writing - Review & Editing, Funding acquisition. H.H.C.: Conceptualization, Writing - Review & Editing, Supervision, Project administration, Funding acquisition. H.A.H.: Writing - Review & Editing, Funding acquisition. A.J.M.: Conceptualization, Methodology, Software, Writing - Review & Editing, Funding acquisition.

Corresponding author

Correspondence to Andrew J. Newman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

41597_2024_4086_MOESM1_ESM.docx

Supplement to The High-resolution Urban Meteorology for Impacts Dataset (HUMID) daily for the Conterminous United States

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Newman, A.J., Kalb, C., Chakraborty, T. et al. The High-resolution Urban Meteorology for Impacts Dataset (HUMID) daily for the Conterminous United States. Sci Data 11, 1321 (2024). https://doi.org/10.1038/s41597-024-04086-2

Download citation

Received: 18 March 2024
Accepted: 05 November 2024
Published: 04 December 2024
DOI: https://doi.org/10.1038/s41597-024-04086-2

Subjects

Abstract

Similar content being viewed by others

Global 1-km present and future hourly anthropogenic heat flux

High-resolution grids of daily air temperature for Peru - the new PISCOt v1.2 dataset

Reconstruction of all-sky daily air temperature datasets with high accuracy in China from 2003 to 2022

Background & Summary

Methods

u-HRLDAS input datasets

Static geophysical attributes

Meteorological forcing data

Urban class and fraction

u-HRLDAS Simulations

Noah-MP

Single layer urban canopy model

Simulation details

Temperature bias correction

Data Records

Technical Validation

Air Temperature comparisons

Dew point temperature comparisons

Summary

Usage Notes

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

41597_2024_4086_MOESM1_ESM.docx

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links