Introduction

Ultraviolet (UV) radiation, an ever-present environmental hazard, poses a substantial threat to genomic stability and cellular homeostasis. Upon UV exposure, a cascade of events unfolds, including the formation of the most prevalent DNA lesions like cyclobutane-pyrimidine dimers (CPD) and pyrimidine-pyrimidone adducts (6-4PP), capable of disrupting crucial DNA processes1,2. Concurrently, UV radiation activates the intricate DNA damage response (DDR) network, spanning cell cycle arrest, inflammation, and apoptosis3,4. If not well-orchestrated, it results in genetic mutations, and genome instability, elevating the risk of malignancies and diseases5,6. Nucleotide excision repair (NER) stands as the primary mechanism for rectifying UV-induced DNA damage7,8, and, the development of genome-wide techniques has provided invaluable tools for studying the repair events of UV-induced lesions and assessing the susceptibility of the genome to such bulky adducts9,10.

Understanding various architectural features of genome structure, including chromosome territories, compartments, and topologically associating domains (TADs), and their temporal dynamics, along with the influence of DNA damage repair mechanisms, is vital to deciphering the DDR machinery. Previous research in the field of radiation-induced DNA damage has primarily focused on the impacts of ionizing radiation on genome structure, particularly X-rays, which delivers a higher energy dose to the chromatin fibers compared to UV. Studies on genome folding have shown that ionizing radiation exposure can cause changes in the spatial organization of the genome upon the formation of chromosomal translocations and double-strand breaks (DSB). Because of this, most of the experiments evaluated how DSBs cause inflammation and how they are repaired in a 3D genome context11,12,13,14,15. It has been reported that increased TAD strength and boundary insulation occur after X-ray exposure across cell types, while TAD boundary strengthening is absent in ATM-deficient fibroblasts.16. This suggests that increased interaction density is linked to ATM kinase’s role in modulating the Cohesin complex and facilitating CTCF protein recruitment, consistent with the γ-H2AX foci localized within TADs17,18. Efficient repair of DSBs has also been reported to occur through chromatin compartmentalization, which mediates the clustering of damaged foci. As part of the early response, DSB containing TADs form clusters into damaged foci via liquid-liquid phase separation, while in the later stages, polymer–polymer phase separation further organizes these clusters into larger compartments19. While these studies have provided crucial insights into the structural changes within the genome following translocation-triggering DSBs, a notable gap persists in understanding how these insights extend to UV-induced DNA damage. Unlike DSBs after ionizing radiation, UV-induced damage primarily results in the formation of pervasive bulky adducts along the genome that needs NER machinery to remove short single-stranded segment that contains the lesion.

Accordingly, we sought to answer the key questions remaining elusive so far for UV radiation: (I) Is UV exposure alone sufficient to induce 3D changes, and does genome folding lead to a more compacted or decompacted structure under UV stress?; (II) Do active and inactive compartments of the genome adjust with DDR-mediated transcriptional regulation?; (III) Do the TAD strengthening and boundary insulation correlate with NER efficiency?; (IV) Do the formation or loss of chromatin loops contribute to the facilitation of transcriptional regulation and/or efficiency of NER machinery?; (V) Does the alteration in 3D genome topology play a role in mediating transcriptional regulation for immediate early genes involved in the UV response, such as well-known AP-1 members JUN and FOS20,21,22,23?

Our research endeavors to fill the gaps in understanding and complete this intriguing picture by using both high-resolution contact maps and RNA-Seq to find out how UV radiation affects both the transcriptional regulation and the three-dimensional structure of the genome (Fig. 1a). We exposed cells to UV radiation (254 nm, 20 J/m2) and collected samples at various recovery time points (12 min, 30 min, and 60 min) to comprehensively study the interplay between 3D genome reorganization, gene regulation, and DNA damage and repair. Additionally, we utilized Damage-Seq10 data collected right after UV exposure and XR-Seq9 data from 12 min later to get a full picture of the dynamic processes that control UV-induced DNA damage and repair. Using single-nucleotide resolution XR-Seq and Damage-Seq maps, we were able to track the temporal changes in DNA repair mechanisms and study the exact locations and types of UV-caused DNA damage. Furthermore, we follow a deductive analysis with respect to hierarchical organization of the genome at different scales. Employing a multi-omics and four-dimensional (4D) methodology enabled us to gain a better understanding of how UV radiation affects the 3D organization of the genome and how this affects gene regulation, expression, and DNA repair processes, as well as damage formation in terms of physical and biological relevance. In this study, we show an immediate and holistic response of genome maintenance to UV exposure and ongoing structural reorganization.

Fig. 1: The immediate effect of UV in chromatin conformation and ongoing alterations in the first hour.
figure 1

a Experimental datasets used in the study. For both lesion types (6-4PPs, CPDs), XR-Seq captures the previous repair events, while Damage-Seq captures lesions’ locations on the genome. b Decay plot on 10 kb resolution and its slope computed with central differences. Decay parameters (α exponents) of fitted powerlaw distrubutions for each sample. c Decay fold-changes normalized with non-UV control. d Saddle-plot of the non-UV sample (100 kb resolution) shows interaction intensity with respect to each compartment percentiles. e Saddle strength of the samples, where the x-axis shows the increasing extent of the compartment percentiles. y-axis show normalized contact strength of (AA + BB) / (AB + BA). (Methods) f Saddle-plot with normalized contact intensities of post-UV samples, relative to the non-UV sample. g Top 20% compartments normalized interaction intensity comparison. Red connectors point out the means. Data points (n = 100) represents the interaction intensities derived from the related 10 × 10 corner matrices of the corresponding sample’s saddle plots. Box plots displays the median (central line), the first and third quartiles (bounds of the box), and whiskers representing data within 1.5 times the interquartile range.

Results

UV exposure favors short-to-mid range interactions

We performed Hi-C sequencing on HeLa cells before and at 12, 30, and 60 min post-UV exposure (254 nm, 20 J/m²), ensuring ample sequencing depth for subsequent comparative analysis (Figs. 1a, Supplementary Data 1). As calculated (see “Methods”), all Hi-C samples exhibited sufficient coverage for analysis at a 10 kb resolution. Also, we carried out RNA-Sequencing under identical treatment conditions as Hi-C sequencing, as well as our previously generated XR-Seq and Damage-Seq datasets24.

One notable observation from the contact maps is the clear decrease in contact frequency as genomic separation increases, a phenomenon often termed distance-dependent decay25,26. Such distinctions in characteristics have previously been associated with variances in biological processes, such as differentiation in cell types and cell cycle stages27, in which increased interaction frequencies at short range distances are related to loop extrusion activity28. To understand the underlying changes in the chromosome conformation dynamics, we calculated the contact frequencies with respect to genomic separation (Fig. 1b). In the distance decay analysis, the post-UV samples exhibited distinct characteristics compared to the non-UV sample, which showed a probability reflecting a loss of distal interactions (1 Mb ~ 10 Mb). Relatively, a trend emerged for short- to mid-range interactions (< 1 Mb), favoring higher frequencies in UV-exposed samples. We computed the slope by employing central differences within each distribution, revealing that the rate of change varies between long, and short-to-mid range interactions. We also fit power-law distributions to obtain α exponents for each sample to show how quickly the decay occurs, and the non-UV sample shows the largest value, differentiating from the UV-exposed samples, and showing the rapid loss of the tail of the distribution for the non-UV sample. α values indicate how quickly interactions decrease in frequency, with higher α showing a steeper decay and a lower probability for short-to-mid range interactions. The magnitudes of α values exhibit a consistent pattern post-UV, reflecting a progressive shift toward more frequent short-to-mid-range interactions after UV exposure.

To better identify the specific ranges of genomic separation in frequency change occurs, we have calculated the fold changes of the distance-dependent decay for each UV-exposed sample compared to the non-UV control (Fig. 1c). Compared to the non-UV sample, UV radiation promptly favors short-to-mid range interactions in a timely manner. This shows, short-range interactions (< 100Kb) are primarily favored in 12 min, and the highest frequency of mid range interactions (100Kb ~ 1 Mb) are seen in 30, and 60 min. These results may be related to the initial local compaction observed in the first 12 min following UV exposure, which appears to be gradually relieved at the later time points of 30 and 60 min. However, the chromosome conformation remains more compact compared to non-UV condition.

UV stress strengthens intra-compartment interactions

The genome is partitioned into distinct spatial compartments, such that stretches of active chromatin tend to lie in one compartment, called the A compartment, and stretches of inactive chromatin tend to lie in the other, called the B compartment25,26,29. Chromosome regions with similar compartment profiles have a higher tendency to interact with each other. Specifically, active regions have a greater frequency of contact with other active regions, whereas inactive regions are likely to have more frequent interactions with other inactive regions. Accordingly, we have identified compartments with the principal eigenvector of the contact matrix of non-UV sample, by stratifying genomic regions into 50 percentiles with similar values of the eigenvector (Fig. 1d, Supplementary Fig. 1a, see “Methods”).

Consequently, we assessed the compartmentalization strength by calculating the ratio of intra- to inter- compartment interaction intensities, and determined that the most powerful interactions within the compartments take place 12 min after the UV exposure, followed by similar strength between 30, and 60 min, in which all of the UV-exposed samples showed high compartment strengths (Fig. 1e, see “Methods”). To accompany this finding, we wanted to elucidate which compartments are majorly differentiated with respect to contact strength after UV exposure. We found that interactions within compartments showed an immediate increase, predominantly within active A compartments, while interactions between A and B compartments showed an immediate decrease at 12 min, when compared to non-UV sample (Fig. 1f). However, this trend is partially followed at 30, and 60 min, due to the strength of interactions within B compartments returned to similar levels of non-UV sample. We also evaluated the interaction strength changes at the top 20% compartmental groups, for A and B (Fig. 1g). Intra A-A interactions post-UV showed the most increase with elevated levels. In contrast, Intra B-B interactions showed an increase at 12 min, but followed with reduced levels of interactions at 30, and 60 min. Also, a trend can be seen with inter A-B interactions, where levels of interactions decreased after UV exposure, in contrast to immediate increase in intra-compartment interactions.

Highest dynamicity of compartmentalization is present within transcription regulatory genes

Chromatin compartment profiles and interaction topology within active and inactive compartments are influenced by shifts in gene expression during cell differentiation30, senescence31,32,33 or in stress conditions34,35. To understand the relationship between compartment profiles and DDR-mediated transcriptional regulation, we sought to perform an exploratory analysis between RNA-Seq and Hi-C datasets.

Accordingly, we conducted a correlation analysis of compartment profiles across the time points to determine whether genes exhibiting post-UV expression trends (clusters) are also correlated with their corresponding EV profile (Fig. 2a). To pinpoint sets of genes exhibiting varying expression levels over time, we first conducted a time series analysis (referred to as TS analysis afterward) to obtain differentially expressed genes (DEGs) that show significant differences across the time points (Supplementary. Data 3). We employed expression profile clustering to organize these genes into clusters based on their time-course expression patterns (Supplementary. Data 4). Each DEG is assigned to a cluster by maximizing the conditional probability, using the maximum a posteriori rule36 (See Methods). This approach yielded clusters representing different patterns and levels of gene expression changes over time (Fig. 2b). Concurrently, we applied eigenvector decomposition to each contact matrix of time points to derive principal eigenvectors, enabling the identification of A and B compartments. We factorized the genomic bins (100 kb bins) into quartiles based on similar eigenvector values for each sample, denoted as B1, B0, A0, or A1, corresponding to strong B, weak B, weak A, strong A compartment profiles, respectively. Each genomic bin, linked to its corresponding factor, was then intersected with genes from each cluster. For each pair of time points, we conducted polychoric factor correlation on the factors associated with the intersected regions. In general, factor analysis showed higher correlations of compartment profiles (~0.90–0.95) within the UV-exposed samples across the three time points, while the correlations remained consistently lower (~0.80–0.85) for UV-exposed samples compared to the non-UV control sample (Fig. 2c). This observation suggests that, compartmentalization changes between pre/post-UV are associated with the expression level differences, implying structural reorganizational effect on post-UV gene regulation. To assess the significance of these correlations, we utilized bootstrapping, which involved randomly selecting an equal number of genomic bins multiple times from the dataset. We then performed polychoric factor correlation on these selected regions between the time points. By comparing the observed correlation from the actual data with the distribution of correlations obtained from bootstrapping, we generated empirical p-values. These p-values indicate whether the observed correlation between conditions significantly deviates from the correlation expected based on the bootstrapped distribution.

Fig. 2: Correlation analysis of compartment profiles under UV stress.
figure 2

a Schematic illustrates the process applying factor correlations on the compartment profiles overlapped with differential gene expression patterns. b Raincloud plots show expression profiles of each DEG cluster with respect to time points. Red connectors point out the means. Data points represent individual DEGs in each cluster. Box plots displays the median (central line), the first and third quartiles (bounds of the box), and whiskers representing data within 1.5 times the interquartile range. Points outside the whiskers indicate outliers. c Heatmaps show compartment profiles correlation between timepoints per cluster. Asterisk signs show the 2-tailed p-values (* <= 0.1, ** <= 0.01). Lower and upper bounds of confidence interval (95%) of bootstrapped (expected) distribution are presented with L and U. d. Barplots show significantly enriched terms (Adjusted p-value < 0.05) for each DEG cluster with gene set over-representation analysis in GO-BP and MSigDB Hallmarks databases. x-axis shows significance (-log10adjusted p-values, hypergeometric test, FDR-corrected (BH)), y-axis shows the enriched term.

Furthermore, we have performed gene set over-representation analysis on the DEG clusters to obtain the enriched terms for biological pathways and molecular signatures. Accordingly, we have manually selected 5 clusters primarily based on the presence of enrichment terms for Gene Ontology Biological Process (GO-BP)37 and The Molecular Signatures DB Hallmarks38 (MSigDB Hallmarks) (Fig. 2b–d, Supplementary. Data 5). Overall, enrichment of these clusters revealed that under UV stress, genes related to positive transcription regulation are down-regulated, and genes on translation and repair/cell death-related pathways are up-regulated (Fig. 2b, d). For clusters 1,2,5 and 8, we have observed that factor correlation is significantly lower for comparisons of all of the UV-exposed samples against non-UV while showing a consistent pattern in parallel with the changes in the expression profiles. Remarkably, cluster 2 exhibited notable variability in factor correlations across each time point comparison, implying a dynamic behavior. The gene set involved in positive transcriptional regulation experiences repression immediately after UV exposure, which suggests overall transcriptional repression during DNA damage response. We have further questioned whether there is an inverse trend, showing different expression profiles in-between post-UV samples (Supplementary Fig. 1b). Among these, clusters 4 and 9 displayed low correlation between post-UV samples, which is significant given the expected correlation. Both clusters are linked to an inverse trend in gene expression among the post-UV samples, particularly evident in cluster 4.

We further focused on the pair-wise comparison between the Control and 12-minute post-UV samples to investigate the level of directional change in compartmentalization with respect to transcription, even within the first 12 min (Supplementary Figs. 1c, d). We assessed the direction of compartment profile (EV) changes among the significant differentially expressed genes (DEGs) and found that up-regulated genes are associated with a shift toward the A compartment profile, while down-regulated genes also exhibit a change toward the B compartment profile. To further validate these findings, we identified significant non-DEGs (see “Methods”) and determined that there was no overall change in compartment profiles among them.

Immediate TAD boundary strengthening after UV exposure

A topologically associated domain (TAD) is a genomic segment where DNA sequences have a higher propensity to interact with each other compared to sequences outside the domain, where these domains are demarcated by boundaries39. As per the existing theory regarding TAD formation, genome regions are dynamically gathered within TADs via loop extrusion facilitated by the cohesin complex until an insulator protein like CTCF obstructs the extrusion process40,41,42,43. However, the potential consequences of UV damage on TAD boundaries remain unknown. To evaluate the impact of UV on TADs, we identified the boundaries of TADs and measured the strength of each boundary, indicating the degree of separation between TADs, thus a quantitative measure of insulation capabilities of a domain boundary.

We found that boundary strengthening takes place at each time point after UV exposure compared to the non-UV sample (Fig. 3a). This shows that the ratio of outgoing interactions from TADs to within TAD interactions is significantly decreasing, proposing a model in which TAD boundaries with increased insulation capabilities in 3D setting. While there is a significant increase (p < 1e-4) in boundary strengths between pre/post-UV samples, there is no significant difference in mean boundary strengths among UV-exposed samples (Fig. 3a). Subsequently, we identified the number of domain boundaries preserved across each time-point transition (Fig. 3b), observing an increasing number at each stage. We also found that a smaller number of boundaries are non-preserved, with a trend that after each transition, the number of non-preserved boundaries decreases, with the highest occurring at the initial transition to 12 min.

Fig. 3: TADs & boundaries show changes post-UV in accordance with repair and damage profiles.
figure 3

a Mean boundary strength is increased post-UV, with 12 min being the highest. (Mann-Whitney U-test (two-sided), p-value: **** ≤ 1e-04) Data points represent individual domain boundaries identifed in each sample. Red line connectors point out the means. Box plots displays the median (central line), the first and third quartiles (bounds of the box), and whiskers representing data within 1.5 times the interquartile range. Points outside the whiskers indicate outliers. b Number of boundary types per transition across time points. c Ridgeline plots show a comparison of boundary strength changes per preserved boundary set at each transition. d TADs with preserved boundaries were grouped by their strengths into quartiles. e Normalized repair levels are plotted with respect to TADs grouped by their strengths, CPDs (Left), and 6-4PPs (Right). f Normalized repair levels are plotted in relation to TAD sizes, CPDs (Left), and 6-4PPs (Right). The plots are color-coded according to DNase-Seq quantiles. The solid line shows the OLS best-fit regression, with the shaded region indicating the 95% confidence intervals. g The top row presents a pile-up analysis (APA), where aggregated and averaged on-diagonal snippets are centered at boundaries to visualize interaction intensity changes. The bottom rows show normalized repair levels (XR-Seq) and damage levels (Damage-Seq) of both lesion types. The repair signal is further normalized against the DNase signal for both lesion types. DNase-Seq levels, CTCF ChIP-Seq (nontreated HeLa-S3) fold over input data, and insulation scores (The solid line represents the mean of the data, and the shaded region indicates the 95% confidence intervals). Columns are organized according to boundaries with increased and decreased insulation properties during the first 12 min following UV exposure.

To better characterize these transitions, we calculated the change in the boundary strength of the preserved boundaries. Among the transitions between the time points, the largest positive change was observed right after the UV exposure, in the first 12 min (Fig. 3c). The transition from 12 to 30 min showed a slight decrease in boundary strengths, followed by a consistency at the 30 to 60-minute transition. This immediate effect proposes that, similar to boundary preservation, the highest boundary strengthening response is seen in early post-UV. Between 30 and 60 min post-UV, boundary strengths remained relatively stable.

Domains exhibiting higher regulatory activity became increasingly insulated following UV exposure

We aimed to investigate repair within TADs to determine whether boundary strengthening after UV exposure aligns with the repair levels of CPDs and 6-4PPs. We scored TADs as the ratio of two numbers, within-TAD intensity to between-TAD intensity, as previously described44. We grouped TADs into quartiles with respect to these scores (TAD strength) (Fig. 3d), and also compared the overall interaction intensity change in the quartile groups by comparing the 12-minute post-UV sample to the Control sample (Supplementary Fig. 2a). We observed that the most insulated TADs in Q4 became more insulated after UV radiation, showing a decrease in interactions between TADs compared to those in the Q1 group (Supplementary Fig. 2a). Additionally, TADs in Q4 were the smallest in terms of length and were more frequently associated with the active A compartment, in contrast to Q1, which contained the longest TADs and was linked to the lower eigenvector (EV) spectrum (Supplementary Fig. 2b). By comparing pre-UV DNase-Seq and FAIRE-Seq signals, we found that TADs in Q4 had the highest levels of DNA accessibility and regulatory activity, as opposed to Q1 group (Supplementary Fig. 2b).

We further projected normalized repair levels at these domains, normalizing the repair levels by nucleotide distribution frequency simulations45. Overall, boundaries showed enriched repair levels (Fig. 3e). For both CPDs and 6-4PPs, the highest repair levels are found in TADs within the top quartile (Q4), which also show enriched intra-TAD repair levels compared to flanking regions. Interestingly, as the TAD strength decreases, intra-TAD repair levels decrease compared to the flanking sites (Q2 and Q1). Relative to 6-4PP, CPD repair is more affected by the TAD insulation, which is in line with the fact that CPD repair is relatively more prone to chromatin factors10,46. We have further questioned the effect whether higher damage is causing this pattern, and further calculated repair efficiency, in which repair signal (normalized repair levels) is normalized with damage signal (normalized damage levels) and respective nucleotide distribution frequency simulations (see “Methods”). We observed the same pattern, with Q4 TADs showing higher repair efficiency compared to Q1 (Supplementary Fig 2c).

In a previous work on aflatoxin-induced DNA damage, increasing TAD length was associated with lower levels of NER activity47. We further investigated whether repair levels change with increasing TAD size after UV radiation. At first glance, we observed a negative correlation between TAD size and repair levels (Fig. 3f, black dashed-lines). However, upon further evaluation, taking into account DNase-Seq levels and compartment profiles, the negative correlation between TAD size and repair diminishes (Fig. 3f, DNAse Quantiles, Supplementary. Fig 2d). This is because shorter TADs are linked to higher DNA accessibility and active A compartments, which are further associated with increased repair levels.

Enriched repair at TAD boundaries

To better understand the relationship between boundary strength changes and NER activity, we compared the repair and damage response at boundary sites with increased and decreased insulation capabilities post-UV. To understand the interaction change surrounding boundaries, we averaged the the signal intensities centering the boundaries, comparing contact maps of 12 min to non-UV (Fig. 3g). We have observed that TADs spanning boundaries with increased strength, have elevated within-TAD intensities, and highly reduced between TADs intensities. Boundaries with increased insulation exhibit a greater magnitude of insulation level changes compared to those with decreased insulation boundaries (Fig. 3g, Insulation Score, Supplementary Fig. 2e).

Normalized repair levels were further projected at boundary sites for both lesion types, CPDs and 6-4PPs, revealing that repair activity is enriched at boundaries where CPD repair is particularly prominent (Fig. 3g, Normalized repair levels). Between the two groups with increased and decreased boundary strength, increased insulation and repair activity correlated with higher DNA accessibility levels, while decreased insulation was associated with lower DNA accessibility and repair (Fig. 3g, DNase levels, Supplementary Fig. 2f). To compare the repair of CPDs and 6-4PPs, we normalized the repair signals against the local change of DNase signals at boundaries, revealing a notable enrichment of CPD repair at boundary sites (Fig. 3g, Normalized Repair / DNase levels). No significant differences in damage levels were observed between these two boundary groups for both lesion types (Fig. 3g, Normalized damage levels, Supplementary Fig. 2f, CPD, 6-4PP damage levels). Additionally, publicly available fold-over-input data for nontreated HeLa-S3 CTCF ChIP-Seq (ENCSR000AOA) was obtained to compare occupancy at boundaries. CTCF levels at the boundary sites showed no significant differences between the two groups (Fig. 3g, pre-UV CTCF levels, Supplementary Fig. 2f). Thus, higher insulation efficiency of strong boundaries post-UV correlates with greater repair efficiency, in contrast to their weaker counterparts.

CTCF-dependent loops are enriched post-UV

Enriched contact frequency peaks that appear as distinct points in mammalian contact maps are a common characteristic, and they propose significant interactions26. These points represent ‘loops’, and may manifest individually or as components of grids, often found at domain corners. Accordingly, chromatin loops are considered the primary mechanism of enhancer-promoter interactions facilitated upon cohesin-mediated loop extrusion accompanied with barrier elements48,49,50. A chromatin loop consists of a pair of proximal regions, referred as loop anchors. For all time points, we have identified chromatin loops and created the intersection sets for loops and their anchors (Fig. 4a). We have observed that the number of anchors common in all samples (51.0%) exceeds the proportion of loops common in all samples (31.6%), indicating that certain shared anchors are involved in forming different loops over time after UV-exposure.

Fig. 4: Loops & anchors show differential characteristics post-UV.
figure 4

a Venn diagrams show the percentages of common and specific anchors and loops. b Stacked bar plots show the number of common and specific loops/anchors for each transition. c For the non-UV and 12-minute common and specific anchors, pie charts show CTCF presence and orientation, and (d) loop size distributions across size ranges. e For the non-UV and 12-minute common and specific loops, off-diagonal pile-up analysis shows intensity change centered at the loop anchors. Corner numbers correspond to center-to-corner ratios. f Normalized repair levels (y-axis) are plotted along flanking regions and loop regions, for all loop types. g Volcano plots displaying the properties of DEGs for the 12-minute (left) and 60-minute (right) time points compared to the control (as the baseline). Red dots indicate genes that overlap with a 12 min specific anchor. DEGs are identified as post-UV timepoint vs control. h Unibind differential enrichment analysis with respect to specific anchors of non-UV Control sample, and 12 min, separately. Significance shows -log10 p-values, Fisher exact tests. i Grouped Hi-C and RNA-Seq genome tracks, with CTCF, and histone markers for JUN locus. Blue and red colors for histone markers signify activation and repression function, respectively. Only JUN is written on genes track for simplicity. Green vertical line shows JUN transcription start site, and red vertical line shows the corresponding looping region.

For each timepoint transition, we separated loops and anchors into two categories: common and specific loops and anchors. The separation allows for a ± 20 kb range, meaning neighboring regions are considered when determining overlap, and requires a signal intensity change of at least twofold to be included. Specific loops to a time point indicate a gain or loss of contact strength relative to the transition, as identified by their significant enrichment in only one of the contact matrices (see “Methods”). Accordingly, we wanted to understand the number of loop/anchor differences between transitions (Fig. 4b). After UV exposure, we observed a decrease in the number of loops at 12 min, followed by the highest number of specific loops at 30 min. Additionally, the number of common loops increased with each transition post-UV.

Hereinafter, we have concentrated on the comparison of non-UV sample and the first time point of 12 min to see the primary effects of UV exposure on loops. CTCF is the predominant protein concentrated at loop anchors when arranged in a convergent orientation51. Consequently, we aimed to understand the paired orientation of CTCF (Fig. 4c), while also considering the presence of single and non-CTCF motifs on the corresponding loop anchors. The initial observation was that common loops had anchors with a higher percentage of convergent orientation, which is the most favored orientation for paired motifs. Also, we have seen that loop anchors that are specific to the non-UV Control sample do not possess the CTCF binding motif, with the highest percentage of 51.3% among the three groups, which declined to 41.7% after UV exposure. This implies that certain transitory loops are lost under stress conditions, and loops are formed through CTCF, among other barrier elements, when exposed to UV. Furthermore, we did not detect significant differences in loop widths among the ~300 kb, ~330 kb, and ~360 kb categories for common, control-specific, and 12-minute-specific loops, with median widths of 220 kb, 240 kb, and 260 kb, respectively. However, common loops tended to be smaller in length. Additionally, no substantial differences in composition were observed (Fig. 4d).

Using pile-up analysis (also known as aggregate peak analysis, APA), we compared normalized enrichment by averaging off-diagonal contact matrix snippets centered at loop anchors, allowing for a detailed comparison of interaction intensities (Fig. 4e). We observed clear gains and losses of signal intensity in specific loops following UV exposure. Enrichment scores of pile-ups were calculated with the corner (center-to-corner ratios), and central scores.

We further focused on understanding repair levels at chromatin loops. In all loop types, we consistently observed that loop anchors exhibited higher repair activity compared to the flanks and the loop body (Fig. 4f). Additionally, specific loops in the non-UV Control sample, as well as common loops, exhibited higher repair levels for both lesion types compared to 12-minute-specific loops. We further investigated whether higher UV damage was driving this pattern by measuring repair efficiency. Similarly, our analysis revealed that common and control-specific loop anchors were repaired more efficiently than 12-minute-specific loop anchors, particularly evident for CPD repair (Supplementary Fig. 3a). In a pre-UV setting, we found that common and control-specific loop anchors had similar levels of DNA accessibility and regulatory activity, both of which were higher than those of the 12-minute-specific loop anchors (Supplementary Fig. 3a). This observation is consistent with the repair levels.

To further investigate the relationship between loop strength and repair, we grouped common loops into quartiles based on their loop strength scores. This allowed us to assess how varying loop strengths might correlate with repair activity (Supplementary Fig. 3b). We observed that loops with the lowest strength exhibited higher normalized repair levels (Supplementary Fig. 3c, Q1). Additionally, further analysis of repair efficiency revealed that Q1 common loops, were associated with the highest repair efficiency, and also in association with highest pre-UV DNA accessibility and regulatory activity. Between Q1 and Q4, we observed increasing loop strength but decreasing repair levels, DNA accessibility, and regulatory activity (Supplementary Figs. 3c, d).

We further aimed to understand whether the UV-induced DNA damage response is associated with loop formation following UV exposure. Accordingly, we have overlapped the 12-minutes specific loop anchors with DEGs at 12- and 60- minutes post-UV. We have observed that, such differential looping event starting at 12-minutes post-UV, is further in association with up-regulated UV-DDR genes encoding immediate early response genes52, interleukin-653, CCN254, DUSP155, and also AP-1 factors FOS, FOSB, JUN and JUNB (Fig. 4g).

To identify potential factors of transcriptional regulation with respect to specific loop anchors for non-UV and 12 min post-UV, we performed differential enrichment at these regions using human transcription factor (TF) ChIP-Seq data available on the Unibind database56 (Fig. 4h). For anchors specific to non-UV Control sample, indicating loops that lost formation after UV exposure, we see a significant enrichment for UV-DDR TFs, such as JUN, a well-known protein for response to UV radiation, increasing cells’ survival capabilities. Besides JUN, the enriched TFs of the non-UV control-specific anchor set were enriched with FOS, JUNB, and JUND, members of the AP-1 transcription factor. This suggests that the potential binding of these TFs to non-UV control-specific sites is indicative of regulation. On the other hand, specific anchors to 12 min (post-UV anchors) showed significant enrichment only for potential CTCF binding, suggesting that the loops are formed through CTCF-mediated loop extrusion, which is in line with the Fig. 4c. Additionally, the TFs such as JUN, JUNB and FOS overlap with the loops formed within the first 12 min post-UV and have potential binding sites at non-UV control-specific anchors, suggesting a role in regulatory activity. (Fig. 4g).

Furthermore, we aimed to investigate whether the transcriptional regulation of genes exhibiting significant changes in expression over time was associated with differential looping. Accordingly, we revisited the time series analysis (TS) and identified the top 25 genes based on significance from the TS analysis, which overlap with high-intensity change differential looping patterns to better characterize expression changes (Supplementary Fig. 3e). We observed that complementary looping patterns associated with changes in expression were present for the selected genes. For instance, in the case of the UV-induced immediate early gene JUN, it shows significantly high expression levels at 30 and 60 min post-UV. Correspondingly, as pointed on the Hi-C matrices, there is a progressively increasing interaction intensity, particularly notable at 30 and 60 min post-UV (Fig. 4i). In this interaction, one anchor involves JUN, while the other anchor corresponds to an enhancer region with an activation function. The facilitative role of the 3D genome in transcriptional regulation is also relevant in this context following UV exposure.

Graph neural network segments genome with respect to 3D dynamics

Up to this point, our analysis has primarily been hierarchical and exploratory, emphasizing genome architectures identified through predefined set of rules. However, recognizing the highly dynamic nature of genome folding, especially under genotoxic conditions, we acknowledge the necessity of studying regions that exhibit differentiation in higher-order dimensions, rather than relying solely on monotonic metrics like insulation scores to identify boundaries57 or a set of kernels to capture significant chromatin loops26. As a result, we have chosen to leverage Graph Neural Network (GNN) methodology to search into topological alterations between pre-UV and post-UV states. Thus, we aim to capture various genome folding alterations using an approach that efficiently compares the two Hi-C libraries and segments the genome into groups based on similar genome folding (topology) alterations. To do so, we centered our investigation on the transition from the non-UV control to 12 min, using it as a pivotal reference point. This approach enabled us to gain deeper insights into the impact of immediate early genes post-UV exposure and to better align our analysis with repair and damage maps.

GNNs have been started to utilized in the context of Hi-C, such as for multi-omics integration58,59,60, 3D genome reconstruction61, contact matrix imputation62, identifying differential point-wise interactions63, and for DSB prediction64. Accordingly, we’re utilizing graph representation learning to exploit the inherent relational characteristics of Hi-C interactions, which are effectively captured in a graph structure.

In essence, our task involves determining whether a graph derived from one contact map (referred to as the “query graph”) is preserved within another graph (referred to as the “target graph”) generated from a different contact map. We aim to identify regions (genomic coordinates) centering the genome folding (topology) change by quantifying deviations in subgraph isomorphism using a GNN (Fig. 5a). This involves measuring the magnitude of violation scores (topology change) of query graphs (smaller) compared to paired target graphs (larger)65. Accordingly, we aim to segment genomic bins into distinct clusters based on topology change characteristics (Fig. 5c) and gain insights into the intersection of 3D genome alterations and DNA damage/repair, thus DDR processes.

Fig. 5: Analysis of genome folding with GNN shows different topologies post-UV.
figure 5

a Schematic illustrating the workflow for the preprocessing steps of contact matrices, dataset generation/graph encoding, and generating different 3D genome topologies to create a violation landscape for GNN training. b Schematic illustrating the inference workflow for the violation scores (topology change) from two contact matrices to be compared using the sliding window operation. c With violation scores (Low or High) and the direction of the comparison, the genome is segmented into four groups. Central coordinates in the on-diagonal pile-ups are scored regions. Pile-ups show ± 500 kb flanking regions, and Observed/Expected interactions. Black square patches show target (± 250 kb) and query (± 200 kb) regions for the generated graphs, respectively. d Pre-UV DNAse-Seq and FAIRE-Seq levels for the centering sites (10 kb resolution) in each comparison groups (n:18666, 20530, 15356, 25782 for Q1, Q2, Q3, Q4 respectively). Red line connectors point out the means. Box plots display the median (central line), the first and third quartiles (bounds of the box), and whiskers representing data within 1.5 times the interquartile range. Comparison groups were ordered by sorted median values. (Mann-Whitney U-test (two-sided), p-value: **** ≤ 1e-04, *** ≤ 1e-03).

To create a violation landscape for training the GNN model, we simulated varying levels of 3D genome topology changes. For this, we preprocessed data from the 4D Nucleome megamap (Fig. 5a, Preprocessing), using merged replicates of non-treated HeLa-S3 cells, and curated a graph dataset. Within the graph dataset, the Hi-C contacts are represented as edge features, while the central node and linear distances are encoded as node features (Fig. 5a, Graph dataset generation). To facilitate the training of the subgraph prediction function aimed at discerning differentiation in genome topology, we created both positive and negative pairs consisting of target and query graphs. Positive pairs consist of target and query graphs centered on the same genomic bin, while negative pairs are sampled from two distinct graphs centered on different genomic bins (Fig. 5a, Generating of mini-batches). This sampling approach ensures lower violation scores for similar topologies and higher violation scores for different topologies. This process involves training a 3-layered Graph Neural Network (GNN) to derive graph-level embeddings for both the target and corresponding query graphs (Fig. 5a, Genome topology evaluation). These embeddings are subsequently utilized to compute violation scores (magnitude of topology change) for each pair of target and query graphs. Following that, these scores serve as input to train a comparator (a tresholding function), which learns to discern between low and high violation score labels by determining an appropriate threshold.

To infer changes in genome topology between non-UV and 12 min post-UV, we preprocess the contact matrices and employ a sliding window operation akin to insulation score calculation (Fig. 5b). At each window (genomic bin), we generated larger target and smaller query graphs central to the same genomic bin from contact matrices to compare. These graphs are constructed using the adjacencies and interaction intensities within the sliding window and the same encoding methodology as in the training phase. We measured the violation of queries to corresponding targets with the embeddings obtained from GNN and aggregated violation scores genome-wide. By employing this approach, we capture various topological changes, including both the gain or loss of contacts, as well as differences in the strength levels of preserved contacts.

Next, we categorize genomic bins (at a 10 kb resolution) into four groups, referred to as ‘comparisons,’ based on the level of violation observed between non-UV and 12 min post-UV, which can be classified as either low or high, and in terms of direction. (Fig. 5c). With this methodology, we evaluated 88.28% of the genome, excluding blacklisted66, balance-masked bins and sparsely interacting regions. The proportions of bins included in comparisons 1, 2, 3, and 4 were 19.79%, 15.15%, 12.35%, and 40.99%, respectively. Accordingly, we merged consecutive genomic bins with the same comparison group label and used central coordinates afterward. For each comparison, we followed the pile-up analysis (APA), and computed the average of on-diagonal snippets from the corresponding contact matrix, thus assessing the normalized contact intensity around the central genomic bin (Fig. 5c). Initially, we observed that Comparison 4, the largest group with low violation scores in both directions between non-UV and 12 min post-UV, exhibited the least dynamicity observed over expected contact intensities. Comparison 1 highlights regions with highly intense short-range interactions, where a significant level of violation is observed in both directions of the comparison. Between Comparisons 2 and 3, we have seen an interesting contrast between aggregated snippets. In the central regions of Comparison 2, we observed below-expected interactions for closer linear distances to the center, while the surrounding areas exhibited above-expected intensities. On the contrary, Comparison 3 displayed an inverse trend, characterized by above-expected intensities for closer linear distances to the center and below-expected intensities in the surrounding regions. This observation suggests that the direction of comparison, coupled with the differentiation between Low and High levels of violation scores, contributed to a sense of relative symmetry in the analysis while also delineating distinct architectural features. By comparing pre-UV DNase-Seq and FAIRE-Seq signals, we found that Comparison 4, which exhibited the lowest topology change, was most associated with the least DNA accessibility and regulatory activity (Fig. 5d). This was followed by Comparison 2, which also showed lower DNA accessibility and regulatory activity than Comparisons 1 and 3. Overall, we observe higher levels of DNA accessibility and regulatory activity in the central regions that are associated with above-expected interactions.

Genome undergoes dynamic and relative alterations following UV-exposure

We aimed to understand the difference in contact intensities within the comparisons in accordance with repair efficiency and damage profiles. We divided the 12-minute pile-ups into non-UV segments, creating subsets to compare time points at each transition and assess the changes in signal intensity during these transitions (referred to as ‘divide-ups’). Again, divide-ups of 4 comparisons yielded different characteristics, in line with violation scores and direction of comparison (Fig. 6a). In Comparison 1, at the first transition post-UV, central regions exhibited an increase in contact strength with surrounding regions while simultaneously experiencing a decrease in strength for contacts traversing these regions. This characterization was accompanied by high repair efficiency (normalized repair levels to damage levels) for CPDs being relatively enriched in the evaluated region and were also high on the flanking sites. On the other hand, Comparison 4, which had the lowest violation scores indicating minimal change in genome topology, showed the lowest repair efficiency. These findings were consistent with DNA accessibility levels, where Comparison 1 exhibited the highest accessibility and Comparison 4 showed the lowest (Fig. 5d).

Fig. 6: Analysis of genome folding shows differential topologies post-UV.
figure 6

a Top row shows divide-ups of 12 min to the non-UV sample. The bottom rows show repair and damage profiles, followed by divide-ups of 30 to 12 and 60 to 30-minute transitions. b Density plots show the absolute intensity changes on the pixel distributions where the pixels’ are located in the target graph region of the corresponding divide-up. Common normalization ensures each area under corresponding density sums to 1. c Unibind differential enrichment analysis for the central sites of Comp. 3 and Comp. 2 for the TFs that their expression is significantly changed. Significance shows -log10 p-values, Fisher exact tests. TFs are ordered with respect to the max significance of the corresponding experiment sets. -log10 Adj. p-values show significance of TS analysis, DESeq’s LRT (BH corrected). d Unibind differential enrichment analysis for the central sites of Comp. 3 for the TFs that their expression is stable across time points. Significance shows -log10 p-values, Fisher exact tests. e Gene set over-representation analysis conducted on Comp. 3 associated DEGS with respect to time points reveals the significance of pathways in a timely manner and (f) unique and enriched terms for High genome topology change (Left), and Low genome topology change (Right). (-log10 adjusted p-values, hypergeometric test, FDR-corrected (BH)), Dashed lines show significance cutoff, Adj. p < 0.05.

Additionally, we have noted symmetry in the division patterns for Comparisons 2 and 3 and in the repair profiles. We observed that in Comparison 2, central regions with intensities below the expected level demonstrated an increase in strength for both pairwise and surrounding interactions. Conversely, in Comparison 3, central regions with intensities above the expected level demonstrated decreased strength for pairwise and surrounding interactions. Furthermore, profiles of repair efficiency for flanking regions exhibited differences between Comparisons 2 and 3, albeit with a notable degree of symmetry within the evaluated regions. Notably, the repair of both lesions exhibited symmetry, which is particularly evident in CPD repair, and associated DNA accessibility levels (Fig. 5d). Additionally, in Comparison 4, we noted the lowest levels of damage, a finding markedly distinct from the other three comparisons. This observation suggests that lower levels of damage are associated with reduced differentiation in genome topology.

In any of the comparisons, we observed a small amount of relative recovery in pixel intensity at the transition from 12 min to 30 min post-UV exposure. However, we did not observe any discernible difference in the transition from 30 min to 60 min post-UV.

We also sought to assess the absolute changes in pixel intensity on the divide-ups. We have collected pixel intensities within the GNN evaluated regions on divide-ups, obtained distributions of fold changes on log2(FC) scale without direction, and performed kernel density estimations (Fig. 6b). Remarkably, we observed that Comparison 4, characterized by the lowest violation scores (lowest topology change), exhibited the lowest pixel intensity changes, favoring the GNN algorithm’s expressive capabilities. We have also observed Comparison 3 is associated with the highest intensity change.

We were especially intrigued by the symmetry observed between Comparisons 2 and 3. Since, in these regions, contrasting 3D genome topologies were associated with increase and decrease in signal intensity, respectively for lower and higher repair efficiency, DNA accessibility, and regulatory activity. Consequently, we investigated transcription factor (TF) binding and leveraged the Unibind experimental data for differential enrichment analysis. We retained only those TFs whose expression changes were significant in the TS analysis and kept TFs supported by at least 10 individual experiments. Our analysis revealed that regions with lowered signal intensity in Comparison 3 exhibited remarkably high levels of enrichment for JUN-family members, with JUN, JUNB, JUND, also, FOS, and ATF3, indicative of AP-1 member enrichment67,68,69 (Fig. 6c). These TFs displayed very high significance levels in the TS analysis, with a notable increase in expression fold-change within the first-hour post-UV. Conversely, regions with increasing signal intensity evaluated within Comparison 2 displayed low but significant enrichment levels for only CTCF and REST. We have not seen any other TF enrichment at Comparison 2 sites, and we also observed significant enrichment of CTCF and REST in Comparison 3. We have also evaluated the TFs that are insignificant in the TS analysis (Fig. 6d). The most significant mean enrichment level was associated with FOSL2, another member of the AP-1 family. Overall, we have seen lower significance levels than the TFs that are significant in TS analysis (Fig. 6c, d).

Furthermore, we sought to explore how the enrichment of pathways associated with the differential gene expression correlates with 3D characteristics of comparisons (Supplementary. Data 7). To accomplish this, we individually identified differentially expressed genes (DEGs) in the analysis between 12 min to non-UV, 30 min to non-UV, and 60 min to non-UV. We overlapped the bins containing transcription start site (TSS) regions of the DEGs in comparison groups separately and conducted gene set over-representation analysis. In each DEG group, TSS for a specific gene was chosen based on the curated matching of canonical transcripts to genes in the MANE project70.

Accordingly, we wanted to understand signaling pathways that are enriched in Comparison 3, so we included the NCI-Nature Pathway Interaction Database in the analysis71. We have observed that “AP-1 transcription network” was significantly enriched, especially for 60 min mark post-UV, and included DEGs such as down-regulated NR3C1, and up-regulated FOSL1, FOSB, ATF3, and JUNB (Fig. 6e). Interestingly, we have identified “EGF receptor (ErbB1) signaling pathway”, including down-regulated EGFR, and STAT3 (Fig. 6e). We have also seen that the “UV Response Down” term, which implies genes that are down-regulated post-UV, was only enriched with the DEGs in Comparison 3 and “UV Response Up” only in Comparison 1 (Supplementary. Data 7).

Additionally, we wanted to understand the enriched pathways in the comparisons with the high genome topology change (Comparison 1,2,3 combined) in contrast to the low genome topology change (Comparison 4), so we kept only the unique enrichment terms (Figs. 6f, Supplementary. Data 8). For the genes belonging to the high genome topology change, we have observed enriched terms such as “TNF-alpha signaling via NF-κB” being the most significant, “TGF-beta Signaling” and “Apoptosis” at the 60-minute mark, and “DNA Repair” with the most enrichment at the initial 12 min post-UV. However, we have not seen any related hallmarks within the enriched terms unique to low genome topology change.

Discussion

Recent efforts demonstrated the maintenance of genomic stability within the framework of higher-order chromatin organization11,12,13,14,16,19. Indeed, ionizing radiation and other genotoxic agents that primarily form DBSs and trigger translocations cause the 3D genome to play a role in DNA repair mechanisms. Understanding the facilitative role of the 3D genome in efficiently excising bulky adducts is crucial, particularly within the context of our cellular resilience against the most prevalent form of damaging radiation in the electromagnetic spectrum: UV radiation. However, post-UV organizational changes in the 3D genome and its interplay with nucleotide excision repair and transcriptional regulation have not been studied.

In this study, our focus was particularly on the earliest time points following UV radiation exposure to better characterize the interplay between DNA damage response and the 3D genome. Accordingly, we have generated the contact maps upon UV exposure and performed hierarchical 3D genome analysis. We have demonstrated that UV radiation can induce alterations across all levels of higher-order chromatin organization, initiating an ongoing structural reorganization with certain characteristics that are reminiscent of those seen post-IR. We also observed decreased distal interactions following UV exposure, consistent with findings observed after X-ray irradiation across various cell types, including fibroblasts16. Concurrently, we observed an elevation in interaction intensity within compartments and a decrease in inter-compartment interactions following UV exposure, similar to the response seen in DSB-related scenarios across different cell types14,16. The loss of distal interactions and increased compartmentalization strength may be potential indicators of a generic response to radiation-induced genotoxic stress.

In our investigation of TADs, we noted a strengthening of boundaries following UV exposure, similar to increased domain insulation seen in X-ray irradiation on different cell types16 (Fig. 7a). Although most of the boundaries are associated with an increase in insulation capabilities, there is a minor subset of boundaries that exhibit decreased insulation. We found that the most insulated TADs pre-UV, which exhibit even greater insulation post-UV, are associated with enriched nucleotide excision repair activity at their boundary sites, along with higher DNA accessibility and regulatory activity within their domains, in contrast to those with lower repair activity and reduced insulation. It is known that CTCF binding on consecutive CTCF-binding sites enhances the insulation capabilities of boundaries72. Given the lack of significant differences in pre-UV levels of CTCF signal between boundary strength increase and decrease groups post-UV, we propose that there is further accumulation of CTCF on strengthening boundaries. It has been shown that the NER factors XPF-ERCC1 and XPG establish CTCF-dependent chromatin loops73, and the XPF-ERCC1 dimer has been reported to interact with CTCF and cohesin complexes74. This suggests a potential overlapping mechanism between NER and loop extrusion, which may enhance insulation capabilities at sites where repair activity is highly enriched, and accordingly domains with higher regulatory activity are further insulated. However, this hypothesis requires further testing using NER-deficient lines to determine whether the same post-UV effects observed in wild-type (WT) cells are also occurring.

Fig. 7: Post-UV 3D genome facilitates UV-induced DNA damage response.
figure 7

a TADs with higher regulatory activity and repair levels are further insulated following UV-exposure. b Weakly interacting regions pre-UV show increased interaction intensity post-UV. These areas have lower repair activity post-UV and were less accessible with lower regulatory activity pre-UV. In contrast, strongly interacting regions have decreased interaction intensity post-UV but are linked to higher repair activity post-UV and were more accessible with greater regulatory activity pre-UV. c Post-UV loops are associated with CTCF-mediated loop extrusion and overlap with UV-response transcription factor genes. These regions also show lower DNA accessibility and regulatory activity pre-UV, resulting in lower repair levels post-UV. In contrast, loops that lose formation post-UV are potential binding sites for UV-response transcription factors and were characterized by higher DNA accessibility and regulatory activity pre-UV, and higher repair levels post-UV.

In all of the 3D analyzes, we observed a similar pattern, albeit with varying levels of CPD and 6-4PP repair. Notably, as known within the NER machinery, CPDs and 6-4PPs exhibit different recognition and repair rates. While CPDs are poorly recognized by XPC alone, 6-4PPs are efficiently recognized by XPC-RAD23B, a crucial initiator of global-genome nucleotide excision repair75. Our observations at the 12-minute mark post-UV exposure highlight the divergence in damage recognition between CPDs and 6-4PPs, with the 6-4PPs displaying a more distorted orientation conducive to efficient recognition, and early repair by GG-NER. Given these insights, focusing on CPD repair offers a more convenient framework for elucidating the alterations in 3D genome and NER interplay.

After UV-induced DNA damage, the genome must not only facilitate efficient DNA repair but also reorganize in response to transcriptional regulation under genotoxic stress to effectively execute the DDR. Accordingly, UV radiation can impact cellular responses through the activation of various signaling pathways involving immediate early genes encoding AP-1 (Activator Protein-1) subunits, such as JUN and FOS family. For instance, AP-1 is important for inflammation induced by tumor necrosis factor alpha (TNF-α) signaling, which we found as the most significant hallmark in the time-course expression analysis. JUN and FOS protect cells from UV-induced cell death and cooperate with NF-κB to prevent apoptosis induced by TNF-α76. Both JUN and FOS are involved in regulating cell cycle progression and apoptosis following UV exposure.

In the GNN analysis, Comparison 2 shows that the post-UV gain of contact intensity in these regions may be regulated by CTCF-mediated loop extrusion, promoting the formation of new loops at sites previously associated with low regulatory activity (Fig. 7b). In contrast, the central regions of Comparison 3 with lowered interaction intensity, exhibited highly significant enrichment for FOS, JUN-family proteins, as well as FOSL2 and ATF3, suggesting potential AP-1 activity that is well-known to occur following UV radiation20,77. On the other hand, we see NR3C1 TF enrichment at Comparison 3 sites, while its expression is downregulated. This potentially shows lower binding at Comparison 3 sites, and limited effect on AP-1 transrepression78,79. Regions in Comparison Group 3, which are potentially bound and regulated by UV-response transcription factors, were further associated with expression data showing an enrichment of pathways related to AP-1 targets, and cellular survival.

This phenomenon was further evidenced in the chromatin loop analysis (Fig. 7c). The effect of the 3D genome on the DDR is highlighted by the formation of new chromatin loops post-UV exposure, which overlap with the genes encoding JUNB and FOS, in association with their up-regulation. The anchor sites of these loops are possibly further enriched with CTCF binding sites. Moreover, we observed that pre-UV control-specific anchor sites within chromatin loops, which lose their formation after UV exposure, serve as potential binding sites for UV-DDR transcription factors, including JUN family members and FOS. This suggests that these factors regulate the response following UV exposure, as evidenced by the reorganization of the 3D genome. Moving forward, our research will focus on generating additional genomic datasets, including JUN/FOS and CTCF ChIP-seq post-UV, to further validate and refine our proposed model. By integrating these datasets with our existing data, we aim to gain deeper insights into the intricate mechanisms underlying higher-order chromatin reorganization and the DNA damage response.

Overall, our study sheds light on the intricate interplay between UV-induced DNA damage, higher-order chromatin organization, and transcriptional regulation, representing the first comprehensive analysis of a unique multi-omics dataset. We demonstrate that UV radiation induces significant alterations across various levels of chromatin organization, reminiscent of responses seen after ionizing radiation. Notably, we observe changes in distal interactions, compartmentalization dynamics, and TAD boundaries, which may serve as indicators of the cellular response to genotoxic stress. Specifically, we focused on the earliest time points within the first hour following UV exposure to capture immediate effects by employing hierarchical and exploratory methods to investigate genome architectures. Using a multi-omics integrative approach, we investigated deeply sequenced Hi-C and RNA-seq datasets, aligning them with XR-Seq and Damage-Seq datasets from our previous study24. Additionally, we utilized a deep learning approach to uncover the mechanisms underlying UV-exposed genome folding. Our findings suggest that the 3D genome architecture plays a crucial role in facilitating the UV-induced DNA damage response, opening up new avenues for research. Furthermore, our investigation into the role of AP-1 subunits, such as JUN and FOS, highlights their involvement in orchestrating cellular responses to UV radiation. This comprehensive approach sheds light on the pivotal involvement of genome folding in the immediate response to UV damage.

Methods

Cell culture and exposure of human cells to UV

HeLa-S3 cells (ATCC, CCL-2.2) were maintained in Dulbeco’s modified Eagle’s medium (PAN-P04-03500) supplemented with 10% (v/v) heat-inactivated fetal bovine serum (PAN-P30-3304), 100 U/ml penicillin/streptomycin (PAN-P06-07100), 2 mM L-glutamine (PAN-P04-80100) and 1x MEM non-essential amino acid (Gibco 11140-35) solution. Cells were maintained at 37 °C with 5% CO2. Cells were grown to 80% confluence in 150-mm tissue culture dishes. UV treatment has been applied as described9. Briefly, culture medium was first discarded, and 15 ml of 1× PBS (PAN-P04-036503) per dish were used to wash the cells and discarded. With the cover off, cells were placed under a germicidal lamp emitting 254-nm UV (1 J/m2/s) for 20 s, then added 20 ml of DMEM to the tissue culture dish, and incubated the cells at 37 °C for the recovery time points of 12, 30 and 60 min. Digital UVX Radiometer (UVP 97001601) was used to ensure that the cells were exposed to the desired UV dosage, and for reproducibility. Cells were immediately followed by Hi-C or RNA-Seq library preperation.

Hi-C Library Preparation and Sequencing

Hi-C library was generated using the Proximo Human Kit version 4.0 (Phase Genomics, Seattle, WA, USA). Cells were cross-linked for 15 min at room temperature with end-over-end mixing. Crosslinking reaction was terminated with quenching solution for 20 min at room temperature again with end-over-end mixing. Quenched cells were rinsed once with 1X Chromatin Rinse Buffer (CRB). A low-speed spin was used to remove the supernatant and the pellet containing the nuclear fraction of the lysate was washed with 1X CRB. After removing 1X CRB wash, the pellet was resuspended in 100 µL Proximo Lysis Buffer 2 and incubated at 65 °C for 15 min. Chromatin was bound to Recovery Beads for 10 min at room temperature, placed on a magnetic stand, and washed with 200 µL of 1X CRB. Chromatin bound on beads was resuspended in 150 µL of Proximo fragmentation mix and incubated for 1 hour at 37 °C. Reaction was cooled to 12 °C and incubated with 2.5 µL of finishing enzyme for 30 min. Following the addition of 6 µL of Stop Solution, the beads were washed with 1X CRB and resuspended in 95 µL of Proximo Ligation Buffer supplemented with 5 µL of Proximity ligation enzyme. Proximity ligation reaction was incubated at room temperature for 4 h with end-over-end mixing. To this volume, 5 µL of Reverse Crosslinks enzyme was added and the reaction incubated at 65 °C for 1 hour. After reversing crosslinks, the free DNA was purified with Recovery Beads and Hi-C junctions were bound to streptavidin beads and washed to remove unbound DNA. Washed beads were used to prepare paired-end deep sequencing libraries using the Proximo Library preparation reagents. The libraries were sequenced on Illumina NovaSeq 6000 platform with PE150 reads.

Hi-C Data Preprocessing

After preprocessing and quality control of FASTQ files with fastp80, raw sequencing reads were aligned to human genome (GRCh38, Gencode v35). The data was processed using the Juicer v.1.6 pipeline (Supplementary. Data 1)81. Only chr1:22 and chrX were processed, and proceeded to downstream analysis. We used Juicer’s calculate_map_resolution.sh script to determine the maximum data resolution for each sample, finding that 10 kb is the highest resolution available across all samples. This resolution was determined based on the criterion that the number of bins with more than 1000 contacts should constitute at least 80% of the total number of bins. Accordingly, .hic files that were generated with Juicer have been converted .mcool format with hicConvertFormat subprogram of HiCExplorer to be utilized for downstream analysis82. To make samples comparable and remove systematic bias, contact matrices were iteratively corrected with ICE83 algorithm implementation with balance subprogram within cooler version 0.9.284, while keeping only cis-interactions (intra-chromosomal), filtering bins that are in blacklisted regions66, ignoring first 2 diagonals, and using variance convergence threshold of 1e-05. We have obtained distance-dependent decay of interaction counts at 10 kb resolution for each Hi-C sample, using “expected_cis” function of cooltools version 0.5.485, and focused on interactions up to a linear distance of 10 mb. We have utilized powerlaw version 1.5.086 to obtain decay parameters for the hyper-tailed distrubutions.

Compartment Analysis

To obtain eigenvectors for compartment profiles using eigen decomposition, we have utilized “eigs_cis” function of cooltools version 0.5.485, at 100 kb matrix resolution. For phasing between A and B profiles, we have tested POLR2A TF ChIP-seq (ENCSR000BGO), RNA-Seq, and GC content to assign bins to correct compartments, in which higher correlation associates to correct eigenvector for compartments. We have generated merged RNA-Seq tracks for each sample using “bamCoverage” function in deepTools version 3.5.487 using BPM normalization, in order to test correlation for eigenvectors. With all of the phasing tracks, principal eigenvector were the one with highest correlation among first three eigenvectors (Supplementary. Data 2). Accordingly, 100 kb bins were grouped into 50 percentiles based on 1st eigenvector. Saddle plots have been calculated with “saddle” function of cooltools considering the observed interactions normalized by expected, in which top left corners corresponds to B compartments, while bottom right corners correspond to A compartments. We have quantified the compartment strength using the “saddle_strength” function of cooltools, which computes the ratio between (AA + BB) and (AB + BA), by comparing the observed/expected values. With respect to saddle plots, compartment strength corresponds to comparing the ratio of values in the upper left and lower right corners to those in the lower left and upper right corners in the plot with an increasing extent. Thus, compartmentalization strength is calculated by assessing the relative enrichment of interactions within the same compartment (AA + BB) compared to interactions between different compartments (AB + BA). Accordingly, we have used this metric to compare degree of segregation between active and inactive chromatin regions between Hi-C samples.

In order to analyze compartment correlations between timepoints with respect to RNA-Seq data, we have obtained 1st eigenvectors for each Hi-C sample and assigned 100 kb bins to factors of A1, A0, B0 and B1. Accordingly, for each gene in each cluster that we have obtained from clustering analysis, we have overlapped the compartment factors (100 kb bins) to associated genic regions using “overlap” function in bioframe version 0.4.188. For each pair of sample correlation given the cluster, we have performed polychoric correlation with “polychoric_correlation” function in RyStats version 0.5.0 with respect to compartment factors. To perform bootstrapping, we have randomly picked 100 kb bins depending on the number of bins on the original overlap to genes for each cluster and sample-pair comparison, 1000 times. Accordingly, we have compared original correlation to the correlation values in the bootstrapped distrubution (expected correlation) to generate two-tailed p-values.

Boundary and TADs analysis

In order to identify boundaries to see how genome is segmented into distinct domains, we have utilized “insulation” function in cooltools version 0.5.485. Briefly, insulation profiles have been calculated for each bin at 10 kb resolution using the sliding window with the size of 500 kb. Insulation profiles indicate specific sites with reduced scores, indicating decreased contact frequencies between adjacent genomic regions, and often these sites represent boundaries89. Accordingly, strongly insulating regions have been identified as boundaries, and boundary strength is calculated using peak prominence method on the minima of the insulation profiles.

To obtain preserved and non-preserved boundaries between each timepoint transition, we have identified boundaries for each Hi-C sample. Boundaries were considered to be preserved with a tolerated range of ± 10 kb. Subsequently, by merging consecutive boundaries and excluding excessively long domains (>1.5 mb), we have compiled a set of coordinates delineating topologically associating domains (TADs). Contact matrix snippets for pile-up analysis (aka. Aggregate Peak Analysis) plots were piled-up with coolpuppy version 1.1.090 for boundaries on-diagonal.

We have calculated domain scores (TAD strength) as described44. Briefly, a domain score is the ratio of within-TAD intensity to between-TAD intensity on the rescaled contact matrix snippets spanning each TAD and flanking regions. Rescaled contact matrix snippets for TADs have been obtained with coolpuppy version 1.1.090.

Chromatin Loop Analysis

Chromatin loops were identified using the “dots” function within cooltools version 0.5.485, operating at a resolution of 10 kb. In summary, default kernels were applied as per cooltools recommendations for the resolution, which included the donut kernel as described in Rao et al.26, along with vertical and horizontal kernels to mitigate the identification of stripes as loops, and a lower-left kernel to prevent the misidentification of pixels at domain corners as loops. Loops with a false discovery rate (FDR) value greater than 0.1, adjusted for multiple hypothesis testing using the Benjamini-Hochberg method, were filtered out, retaining only significantly enriched pixels for each sample.

During each transition between time points, we identified FDR-tested significant chromatin loops and their associated anchors that are shared within a permissible range of ± 20 kb, categorizing them as common loops or anchors. Accordingly, loops falling outside this range are deemed specific to a particular time point in the transition, suggesting potential enrichment or loss of interaction strength relative to transition. The specific loops are further refined to include only those with a signal intensity change of at least twofold.

We have obtained CTCF binding sites using the Bioconducter package CTCF91, and utilized “ctcf_orientation.py” accessible under CTCF_orientation to obtain CTCF orientation on the loop anchors. Also, we have utilized “LoopWidth_piechart.py” accessible under LoopWidth to obtain categorized loop lengths.

Off-diagonal snippets of the loops were piled-up with coolpuppy version 1.1.090, with a flank of 100 kb, generating a 21 × 21 matrix, in which central pixel corresponds to the interaction of the loop anchors. Corners scores of off-diagonal pile-ups are calculated with the ratio of the central pixel intensity to the mean pixel intensity of the 6 × 6 matrix on the associated corner (aka. Peak to Corner scores). To calculate loop strength scores for an individual loop, we have used the ratio of central pixel intensity to the mean intensity of remaining pixels, and assigned a loop score to each individual loop (aka. Peak to Mean score).

Genome tracks of Hi-C, RNA-Seq, CTCF levels and related histone markers are visualized with Figeno version 1.0.592. Faire-Seq data (ENCSR000DCG) that is deposited in ENCODE database93, was lifted over prior to usage with UCSC liftOver tool94 with the related hg19 to hg38 chain file.

GNN Workflows for Detecting Regions Centering Topology Change from Contact Maps

For training phase of the graph neural network (GNN) model in order to learn a subgraph-isomorphism violation landscape for Hi-C contact differentiations, we have preprocessed megamap at 10 kb resolution of Hi-C experiments done on HeLa-S3 with merged replicates (4DN data portal: 4DNESCMX7L58)95,96. Briefly, we have iteratively corrected the contact matrix with ICE83 algorithm implementation with balance subprogram within cooler version 0.9.284, as we described in Hi-C data preprocessing. Accordingly, we have obtained observed over expected matrices per chromosome arm using “_make_cis_obsexp_fetcher” function in cooltools version 0.5.485. Followingly, we have preprocessed Hi-C contacts per matrix using quantiles information with “QuantileTransformer” function in Scikit-learn version 1.3.0 into 1000 quantiles with Uniform distrubution, with the aim of making different datasets comparable, thus transforming contacts into 0 to 1 range.

To generate a graph dataset for training, we randomly selected a bin and constructed a graph from the 51x51 matrix where the selected bin is the central bin, corresponding to 250 kb flanking regions at a resolution of 10 kb, totaling to 510 kb regions. We have decided on such linear distance to understand local topology change in 3D, in accordance to distance-dependent decay comparisons between non-UV to post-UV 12 min sample in which short-to-mid range interactions are particularly favored post-UV. We iterated through this process until the total number of generated graphs reached 10% of the total available bins per matrix. Furthermore, we have encoded node features in a graph into two categories; central node or non-central node, and relative linear distances scaled on the distance to central node to ensure non-central nodes have unique identities. Accordingly, transformed Hi-C contacts were assigned as the edge features.

For the training process, we constructed positive and negative target/query pairs as mini-batches, while ensuring central nodes are preserved. Positive pairs were generated by randomly selecting a graph and sampling a target graph from the available bins, followed by sampling a query graph from the selected target graph. Conversely, negative pairs were generated by randomly selecting two graphs, ensuring that they are central to different genomic bins. A target graph was then sampled from one of the selected graphs, while a query graph was sampled from the other ensuring number of nodes are smaller. By employing a target graph sampling strategy that encompasses a varying number of nodes within the range of (31, 51], we aim to train our model to generalize more effectively for contact maps with varying/nominal interaction coverage. This methodology enables the model to adeptly adapt to various graph sizes, enhancing its capability to navigate scenarios where contact maps might lack extensive coverage, while disregarding areas with sparse interactions.

We opted for a model design featuring 3 GNN layers, a choice informed by the observation that the diameter of query graphs approximates 3 in our Hi-C samples. This decision ensures that the model’s architecture aligns well with the inherent structure and complexity of the graphs, facilitating effective information propagation and feature extraction across the network. For the choice of GNN operator, we adopted Graph Isomorphism Network (GIN)97 with a slight modification described by Hu et al.98, in order to incorporate edge features into message-passing between nodes, referred as “GINE”. We chose to adopt the subgraph prediction function for training from the Neuromatch65 algorithm because of its demonstrated capability to capture subgraph relations in the embedding space by enforcing the order embedding constraint. Briefly, the function evaluates whether a target graph node’s embedding has a k-hop neighborhood that is subgraph isomorphic to the k-hop neighborhood of a center node in the query graph, based on the embeddings of these nodes. It achieves this by minimizing the max margin loss, ensuring that positive pairs exhibit embeddings where the query node’s elements are less than or equal to the corresponding elements in the target node’s embedding. Conversely, for negative pairs, the violation of the subgraph constraint must exceed a certain margin in order to have zero loss. Furthermore, we also propagate amount of violation of target and query pairs to a classifier, referred to as the comparator, with the aim of learning a threshold for distinguishing between “Low” and “High” scores.

Therefore, we aimed to compare non-UV and 12 min Hi-C samples (10 kb bin resolution), considering this as the initial transition after UV-exposure and a reference point aligned with the timepoint of XR-Seq, to learn about altering topologies and immediate early response. To infer, we applied identical preprocessing steps to both of these contact maps as those used for the 4DN megamap, also disregarding blacklisted, balance-masked and sparsely interacting regions. Accordingly, diverging from the graph dataset generation process during the training phase, we have generated target and query pairs using a sliding window approach, with the strict number of node constraints, tolerating maximum node number of 41, and 51 for query and target, respectively. Therefore, we have acquired violation scores for each 10 kb bin genome-wide, and learned the labels for distinguishing between “High” and “Low” scores.

GNN model has been implemented with Pytorch version 2.0.199 and Pytorch Geometric version 2.3.1100. Training and inference workflows were implemented with Pytorch Lightning version 2.0.6 in order to optimize distributed data parallel strategy for multi-processing. The model has been trained on Nvidia A100 in Sabanci University’s ToSUN HPC with Slurm Workload Manager.

Transcription-factor differential enrichment analysis

We have performed differential enrichment analysis for TF binding using Unibind workflow56. Unibind is a database containing experimentally derived transcription factor binding sites (TFBSs) specific to various cell types or tissues, obtained from ChIP-seq experiments. Briefly, we have obtained Unibind’s LOLA database hg38_robust_UniBind_LOLA.RDS v3, from the corresponding Zenodo repository. In order to perform differential enrichment, we have preprocessed genomic coordinates in bed format, and ran LOLA version 1.32.0101 for enrichment calculation and statistical significance with Fisher exact tests. We have only kept the experiment sets that their enrichments are significant (p-value < 0.05). And also, we have only kept the transcription factors that are supported by at least 10 individual experiment sets.

RNA-Seq library preparation and sequencing

RNA isolation was carried out using Genezol RNA isolation reagent (Geneaid GZR100) following the manufacturer’s instructions. A total amount of 0.4 μg RNA per sample was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5X). First strand cDNA was synthesized using random hexamer primer and M-MuLV Reverse Transcriptase (RNase H-). Second strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. The remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3’ ends of DNA fragments, NEBNext Adapter with hairpin loop structure were ligated to prepare for hybridization. In order to select cDNA fragments of preferentially 250 ~ 300 bp in length, the library fragments were purified with AMPure XP system (Beckman Colter, Beverly, USA). Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and library quality was assessed on the Agilent Bioanalyzer 2100 system.

The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumia) according to the manufacturer’s instructions. After cluster generation, the libraries were sequenced on an Illumina Novaseq 6000 platform and 150 bp paired-end reads were generated.

RNA-Seq Data Preprocessing

After preprocessing and quality control of FASTQ files with fastp80, raw sequencing reads have been aligned to human genome (GRCh38, Gencode v35) and its corresponding annotation file, with STAR aligner version 2.7.6a102 with arguments –genomeLoad NoSharedMemory –quantMode TranscriptomeSAM –twopassMode Basic –outSAMtype BAM SortedByCoordinate. Followingly, transcripts (transcripts per million) were quantified using Salmon version 1.6.0103.

Gene Expression and Time-Series Analysis

Transcript-level abundance, estimated counts and transcript lengths, were imported with Bioconducter package tximport version 1.28.0104 to be utilized for downstream analysis. Differential gene expression analysis was conducted with Bioconducter package DESeq2 version 1.40.2105. Pre-filtering was applied to the data sets to filter low read counts in which genes with low number of reads were removed from the analysis. The read counts across the samples were normalized using the DESeq method, based on median ratio of gene counts. In order to perform time series analysis, likelihood ratio test was utilized with the full model on the time point factor, and the reduced model on the intercept. Resulting genes were filtered statistically (adjusted p-value  ≤  0.05) to obtain differentially expressed genes over time. We have further utilized “altHypothesis” option in DESeq’s “results” function to obtain significant non-DEGs (adjusted p-value  ≤  0.05) that their absolute log2 fold change is lower than 0.5. To match with EV profiles, gene level TPM values have been obtained with tximport, and assigned to Hi-C bins.

Clustering and Gene Set Over-Representation Analysis

Clustering of the genes with respect to time-series analysis was performed with Bioconducter package coseq version 1.24.036. DESeq normalized counts were used as input to coseq, without further normalization within coseq internally. Arcsine transformation was applied to expression matrix, and co-expression analysis of the differentially expressed genes were performed with Normal mixture model. Genes that were uniquely assigned to clusters with maximum conditional probability (min: 0.9) were retained for cluster visualization and over-representation analysis per each cluster.

Statistically enriched (adjusted p-value  ≤  0.05) terms of Gene Ontology Biological Process37 and The Molecular Signatures DB Hallmarks38, on genes assigned to each co-expression cluster were identified using Bioconducter package clusterProfiler version 4.8.1.106 Over-representation analysis for each cluster was based on the background of the total differentially expressed genes with respect to time factor, obtained from the likelihood ratio test. For gene sets created in GNN comparison analysis, we have used “enrichr” function in GSEApy Python package version 1.0.6107 to perform over-representation analysis while also using NCI-Nature PID database71.

XR-Seq and damage-seq data preprocessing and simulation

Raw XR-Seq and Damage-Seq reads were accessed under SRA PRJNA608124. We have processed the raw reads with XR-Seq and Damage-Seq Snakemake workflow version 0.7, that is accessible under xr-ds-seq-snakemake24, with Snakemake version 7.32.4108. Briefly, adapter sequences (TGGAATTCTCGGGTGCCAAGGAACTCCAGTNNNNNNACGATCTCGTATGCCGTCTTCTGCTTG) were trimmed from 3′ ends of raw XR-seq reads, and adapter sequences (GACTGGTTCCAATTGAAAGTGCTCTTCCGATCT) were trimmed from 5′ ends of raw Damage-seq reads using cutadapt version 4.1109. Then, trimmed reads were aligned to human genome GRCh38 using Bowtie2 version 2.4.1110. Resulting bam files were converted to bed using bedtools version 2.29.0111. Also, aligned reads were sorted and duplicated regions were removed. Accordingly, we have processed the XR-Seq and Damage-Seq data for the exact repair and damage site coordinates, mapped to 1 kb intervals on human genome, and performed Counts Per Million (CPM) normalization.

We have also generated synthetic NGS reads from the input DNA sequencing of HeLa-S3 (SRA:PRJNA608124) using NGS read simulator Boquila version 0.6.045. Briefly, the simulation tool Boquila randomly picks genomic regions to ensure that the chosen pseudo-reads match the nucleotide frequency of the provided NGS dataset. Accordingly, we have generated synthetic XR-Seq and Damage-Seq datasets, and utilized in order to have an expected background for repair and damage maps genome-wide with respect to dimer frequency. We have also performed CPM normalization for synthetic reads on 1 kb intervals. “Normalized repair levels” and “normalized damage levels” are calculated with the normalization of original XR-Seq and Damage-Seq datasets to their correponding synthetic datasets. Furthermore, “repair efficiency” is calculated by normalizing the “normalized repair levels” with the “normalized damage levels,” providing a normalized repair signal in relation to the damage signal.

Software

We conducted our analyzes using the Python language version 3.11.0 and R language version 4.3.1 with Bioconductor version 3.17. The manuscript was written and edited collaboratively on GitHub using Manubot.112 Plots were generated with seaborn version 0.13.0113, and figures were arranged on the open-source vector graphics software Inkscape.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.