Abstract
CRISPR-Cas9 genome editing has been extensively applied in both academia and clinical settings, but its genotoxic risks, including large insertions (LgIns), remain poorly studied due to methodological limitations. This study presents the first detailed report of unintended LgIns consistently induced by different Cas9 editing regimes using various types of donors across multiple gene loci. Among these insertions, retrotransposable elements (REs) and host genomic coding and regulatory sequences are prevalent. RE frequencies and 3D genome organization analysis suggest LgIns originate from randomly acquired genomic fragments by DNA repair mechanisms. Additionally, significant unintended full-length and concatemeric double-stranded DNA (dsDNA) donor integrations occur when donor DNA is present. We further demonstrate that phosphorylated dsDNA donors consistently reduce large insertions and deletions by almost two-fold without compromising homology-directed repair (HDR) efficiency. Taken together, our study addresses a ubiquitous and overlooked risk of unintended LgIns in Cas9 editing, contributing valuable insights for the safe use of Cas9 editing tools.
Similar content being viewed by others
Introduction
The CRISPR-Cas9 system is widely used to generate DNA double-strand breaks (DSBs) in a sequence-specific manner in the genome, which trigger cellular DNA damage repair pathways and result in various types of alterations. Typical on-target DNA repair induced by Cas9-mediated DSBs results in short insertions and deletions (indels) less than 20 bp1, which often leads to loss of function of the targeted gene. However, it is difficult to achieve a specific outcome via this type of genome editing. To achieve precise genome editing outcomes following Cas9-induced DSB, foreign donor sequences are generally needed through HDR and microhomology-mediated end-joining (MMEJ) or non-homologous end-joining (NHEJ) dependent integration2,3,4.
As the CRISPR-Cas9 system enters clinical applications5,6, there is an increasing need for understanding its potential genotoxic side effects that pose long-term risks. Previous studies have documented that CRISPR-Cas9 editing can introduce off-target mutagenesis due to the mismatch of gRNA7. Recent work on mouse and human stem and germ cells has brought attention to the fact that Cas9-based genome editing can also generate undesired on-target byproducts, including nonrandom large and complex structural variants (SVs)8,9,10, increased single-nucleotide variants (SNVs)10, and frequent loss of heterozygosity11. However, most published work on Cas9-induced mutagenesis is performed without donor templates. Although exogenous vector integrations have been reported in AAV- and plasmid-driven Cas9 editing9,12,13, the effects of various types of donor sequences on Cas9-mediated genome editing outcomes, as well as the landscape of Cas9-induced unintended LgIns, remain unclear.
We previously developed a long-read sequencing technology, IDMseq, to enable a sensitive, quantitative, and haplotype-resolved analysis of Cas9-mediated on-target mutagenesis without donor templates10,14,15,16. Utilizing unique molecular identifiers (UMIs) added to the original target DNA (Fig. S1a), IDMseq can analyze tens of thousands of individual alleles from Cas9-edited cells and report diverse types of variants with frequency as low as 0.004%10,17. A previous study using PacBio SMRT-seq with dual UMIs demonstrated on-target integration of homologous sequences and repetitive genomic sequences in Cas9-edited regions18. However, due to the low number of captured molecules (a few hundred alleles per sample) and the inability of dual-UMI amplicon sequencing to report accurate allele frequencies, the study failed to provide an unbiased and quantitative analysis of Cas9-induced LgIns. Such analysis is crucial for evaluating the safety implications of LgIns in Cas9 editing. To obtain a correct quantitative understanding of Cas9-induced unintended LgIns and the effects of donor type on template-directed Cas9 genome editing, we performed IDMseq with Oxford Nanopore Technologies (ONT) sequencing on human embryonic stem cells (hESCs) after Cas9 editing, both with and without different types of donor DNA.
Results
Cas9 editing with different types of donor templates results in diverse mutation spectra
We designed three types of HDR donor templates−single-stranded oligodeoxynucleotide (ssODN, 130 nt), linear double-stranded DNA (dsDNA, 180 bp), and circular dsDNA (dsDNA as a plasmid, 3197 bp)−to install a chr19:11,378,195 C > T point mutation in the last exon of the EPOR gene using an efficient sgRNA shown previously10 (Fig. 1a and S1b). The donor templates and wild-type (WT) S. pyogenes (S.p.) Cas9 ribonucleoprotein (RNP) were electroporated into the H1 ESCs. The electroporation of circular dsDNA (CircularDsDNA) together with S.p. Cas9 RNP caused significant cell death (Fig. S2), so Alt-R® S.p. HiFi Cas9, which showed less cytotoxicity, was used instead with the CircularDsDNA template to ensure sufficient cells for analysis. After the same Cas9 gene-editing procedures (see Methods), we harvested cells and performed IDMseq analysis as previously described10 on wild-type unedited hESCs (WT), and those edited with no template (OnlyCut), an ssODN template (ssODN), a linear dsDNA template (LinearDsDNA), and a CircularDsDNA template (Fig. 1a and S1, and Methods).
a Schematic of the experimental design: H1 ESCs undergo Cas9 RNP editing using various donor types. The edited cells are harvested for IDMseq with ONT sequencing to detect SNVs, large deletions, and insertions in the target region. Created with BioRender.com. b Unintended genome integration following Cas9 editing of the EPOR (chr19) loci in the whole genome (left) and the EPOR cut ±2 kb region (right). In the circle plot, putative off-target sites, lengths of insertions, involved repetitive elements, FANTOM5 TSS peaks, ENCODE cCREs, H3K27Ac marks, genes, and insertion links are presented from outer to inner. The colors of the links indicate different samples: red for OnlyCut, blue for ssODN, green for LinearDsDNA, and purple for CircularDsDNA. Gene-related integrations are specified in the innermost circle, and coding sequence insertions are labeled in red. c Analysis of the origins of Cas9-induced LgIns in the absence of donor templates, categorized by donor types. The results are based on six independent Cas9 editing analyses targeting the EPOR, PIGA, PANX1, KCNQ1, PEG10, and SNRPN loci, presenting the average frequencies of insertion origins with standard error of the mean (SEM) indicated, n = 6 biologically independent experiments. d Same as (c), but for repetitive element-related integrations, categorized into different classes of DNA repeat elements (DNA), long interspersed nuclear elements (LINE), long terminal repeat elements (LTR, including retrotransposons), simple repeats (microsatellites), and short interspersed nuclear elements (SINE, include Alus). e Same as (c), but for cis-regulatory element-related integrations, categorized by donor types, broken down into promoter-like signature (PLS), proximal enhancer-like signature (pELS), distal enhancer-like signature (dELS), DNase-H3K4me3, CTCF-only, and FANTOM5 TSS peaks (TSS).
An average of 11,816 alleles were identified from individual samples using VAULT10, and all samples were subjected to the same variant calling pipeline to detect both SNVs and large SVs (Table 1). Variant analysis identified the desired point mutation (chr19:11,378,195 C > T) in the ssODN and LinearDsDNA samples, with frequencies of 10.06 and 0.72%, respectively (Fig. S3a). The desired mutation was absent in the CircularDsDNA sample. A significant increase in de novo SNVs was observed in all edited samples, and the SNVs were enriched in the C > T(G > A) transition. These results are consistent with previous findings10,19 and provide assurance of the quality of the sequencing data (Figs. S3a, S4).
SV analysis revealed that 10.94% of alleles in the OnlyCut sample harbored large SVs (>30 bp) in the EPOR target region. The presence of ssODN and LinearDsDNA donors results in large SV allele frequencies of 8.66 and 9.86%, respectively. The frequency of SV alleles was lower in the CircularDsDNA sample (1.37%), which might be attributed to the use of HiFi Cas9 or reduced gene-editing efficiency. The majority of SVs in all samples were deletions regardless of donor types (Table 1 and Fig. S3b). Consistent with the previous report that microhomologies are prevalent in Cas9-induced large deletions20, our data of template-directed Cas9 editing suggest microhomologies around the Cas9 cutting region are the main reason for the generation of recurring large deletions (Fig. S3c). The presence of a template during Cas9 editing significantly decreased the frequency of large deletions as reported21, while unexpectedly increasing the frequency of LgIns (Fig. S3b).
Quantitative single-molecule analysis of Cas9-edited alleles reveals the landscape of large insertions
IDMseq enables quantitative analysis of Cas9-induced LgIns longer than 30 bp10. It detected a number of insertions in edited cells (Table S1 and Fig. 1b). All insertions were around the Cas9 cut (±200 bp), most of which immediately flanking the cut (Fig. S5a). As compared to large deletions, which usually result in loss of function, LgIns can lead to more complex genetic and functional changes due to the unpredictable nature of the inserted DNA. Using the ultra-sensitive IDMseq, we were able to quantitatively investigate LgIns for the first time. We detected 52 (0.43%) LgIns sized from 32–629 bp in the OnlyCut sample (Fig. S5b). When donor templates were present, the frequencies of LgIns in template-directed Cas9 editing increased to 0.79% with ssODN and 1.61% with LinearDsDNA, respectively (Fig. S3b).
We examined the origin of inserted DNA in OnlyCut (Table S1) and found that most were unique and 96.15% of which aligned to the human hg38 reference genome (Fig. 1b, c). Consistent with previous findings18, none of them overlapped with the putative off-target sites (check ±1 kb regions, Fig. 1b). Instead, 25% of inserted DNA were from the regions around the Cas9 cut site (±2 kb), while the rest were from either distal regions of the target chromosome (23.08%) or other chromosomes (48.08%) (Fig. 1c). Interestingly, we detected two inserted DNA (3.85%) with length of 31 and 36 bp, respectively, that cannot be aligned to any known nucleotide sequences by NCBI BLAST, and bear no sequence similarity to each other, suggesting that de novo sequences can be generated during Cas9 editing likely by error-prone DNA repair mechanisms.
Functional analysis of Cas9-induced large insertions reveals genomic hotspot origins of inserted DNA
Multiple sequence alignments of the inserted DNA and the edited region identified no significant similarity. Further annotation using the RepeatMasker database showed that a large number of inserted DNA (46.15%, excluding those in the cut ±2 kb region) in OnlyCut were from repetitive genomic regions that consist of interspersed repeats and low complexity DNA sequences (Fig. 1c). Specifically, retrotransposable elements (REs)–long terminal repeats (LTRs), long interspersed elements (LINEs), and short interspersed elements (SINEs)–accounted for 86.21% of the repeats (Fig. 1d). All inserted LINEs were truncated, with the majority being LINE1 elements (87.5%). Since retrotransposons were known to be abundant and constituted over 40% of the human genome22, we investigated whether there was further enrichment of REs in Cas9-induced unintended LgIns. Chi-square testing revealed that the observed occurrences of various RE insertions in EPOR editing aligned closely with expected values (Table S2). Thus, the prevalence of RE insertions can likely be attributed to chance, suggesting that DNA repair mechanisms randomly acquired genomic fragments during DSB-based Cas9 editing.
To gain a more accurate estimation of repetitive genomic integration in Cas9 editing, we aligned inserted DNA to human telomere-to-telomere genome (T2T-CHM13/hs1) and investigated centromeric and pericentromeric satellite-related insertions. Results showed that only a small fraction of insertions (1.92% in OnlyCut, 9.3% in ssODN, 0% in LinearDsDNA, and 16.67% in CircularDsDNA) were from centromeric regions.
The 3D chromatin architecture determines the spatial contact frequency of genomic sequences. We examined if unintended LgIns in Cas9 editing are relevant to 3D genome organization using published Hi-C and Micro-C data of H1 ESCs (4DN accession number: 4DNFI6HDY7WZ and 4DNFI9GMP2J8). Results showed no interaction between the original loci of LgIns and the EPOR edited region in OnlyCut, with only a weak interaction detected (chr19:11404450-11404767, 26-kb away from the editing site, three contacts in Hi-C and five contacts in Micro-C) in other samples.
We examined whether host genomic integrations were related to cis-regulatory elements in the genome. 38.46% of inserted DNA (excluding those in the on-target ±2 kb region) in OnlyCut were located next (±1 kb) to FANTOM5 transcription start site (TSS) peaks, ENCODE candidate cis-regulatory elements (cCREs), and H3K27Ac peaks (Fig. 1b, e). It is worth noting that most regulatory elements adjacent to inserted DNA (66.67%) were also annotated by RepeatMasker. When excluding on-target and repetitive genomic integrations, five of the remaining 13 genomic insertions in OnlyCut were further classified as regulatory element adjacent integrations.
We checked whether there were any LgIns of unrelated genes in the edited samples. The result showed that 76.92% of inserted DNA in OnlyCut were located in genic loci (annotated with UCSC hg38 ncbiRefSeq database). Furthermore, we identified five insertions (9.62%) originating from protein-coding regions, three of which were located in the EPOR coding region near the Cas9 cut. Most importantly, we detected two insertions (3.85% of LgIns) originated from exon 3 of the Tensin-4 gene (TNS4)—an oncogene that regulates cell survival, proliferation, and migration23—on chromosome 17 (Fig. 2a). The two corresponding mutant alleles contained that same chr19:11,381,670 G > A de novo SNV in EPOR (Fig. 2a), suggesting that the EPOR-TNS4 chimeric mutation was potentially amplified by cell proliferation and could pose a risk to the Cas9-edited sample.
a The TNS4 exonic integration was detected in two alleles in OnlyCut. The Cas9 cutting site (chr19:11,378,199–11,378,200) is indicated by the red dotted line. Both alleles share the same LgIn (chr19:11,378,199Ins) and the chr19:11,381,670 G > A de novo SNV (red arrowhead). The UMI is indicated by the green arrowhead. The target locus genomic map shows the amplicon for IDMseq. The red boxes in (a–d) indicate inserted DNA. b Transcription start site (TSS) integration detected in ssODN. The Cas9 cutting site is indicated by the red dotted line. The 245-bp inserted sequence originates from chr21 and contains two FANTOM5 TSS peaks, and is also annotated by RepeatMasker. c The ASB11 exonic integration detected in the Cas9-edited PIGA locus. d The TSS integration detected in the Cas9-edited PANX1 locus. A 207-bp sequence from the human mitochondrial genome, containing multiple FANTOM5 TSS peaks, is inserted into the Cas9 cutting site.
Alternative TSS usage plays a crucial role in organismal development, significantly contributing to transcript isoform diversity in humans, and is frequently implicated in human diseases, including cancer24. Several FANTOM5 TSS sequences from distal loci were inserted into the Cas9 cut in OnlyCut and ssODN (Figs. 1e, 2a, b). The unexpected TSS integrations can lead to the production of spurious transcripts and disruption of normal cellular function.
To understand the prevalence of exogenous exon and TSS integrations, we further examined our IDMseq data of Cas9 editing in the PANX110 and PIGA17 loci (Fig. S6), and performed ONT Cas9 enrichment sequencing on Cas9-edited KCNQ1, PEG10, and SNRPN loci (see Methods and Table S3). Consistent with the EPOR editing results, we detected exonic integration of BOC (1 allele), SPEF2 (11 alleles), and mitochondrial MT-ND5 (4 alleles) in PANX1 edited cells, and ASB11 (1 allele) in PIGA edited cells (Fig. 2c). We identified 4 FANTOM5 TSS integrations that originated from mitochondrial genome in PANX1 editing (Fig. 2d), two TSS integrations in PIGA editing, four TSS integrations in KCNQ1 editing, five TSS integrations in PEG10 editing, and one TSS integrations in SNRPN editing. The consistent integration of protein coding and TSS sequences in six independent Cas9 editing experiments revealed a ubiquitous and overlooked risk of DSB-based Cas9 editing.
Template-directed Cas9 editing leads to the integration of full-length or tandem concatemeric dsDNA donors
Previous reports showed that ssODN donors lead to efficient HDR25,26. More recently, the presence of ssODN donors was shown to reduce Cas9-induced large deletions in T cells and hematopoietic stem and progenitor cells (HSPCs) but not in pluripotent stem cells21. With the more sensitive IDMseq, our data showed that ssODN also reduced large deletions in H1 ESCs (Table 1). These findings lend support for ssODN as a preferred HDR donor type. However, how ssODN affects LgIns has not been studied. Co-delivery of ssODN with Cas9 RNP increased the frequency of unintended LgIns by 1.8-fold as compared with OnlyCut (Table 1 and Fig. S3b). Interestingly, we found that most insertions were from the host genome, and only 6 (6.98%) were from donor templates as they contained the chr19:11,378,195 C > T point mutation (Fig. S3d). As seen in OnlyCut, functional analysis showed that the vast majority of inserted DNA (75.58%) in ssODN originated from genic loci, with three instances specifically arising from the exons of RNA28S, RNA45S, and an uncharacterized gene, LOC124900812 (a potential ncRNA gene overlapping with an LRT element, Fig. 1b and S3e, f).
Contrary to the infrequent unintended donor integration observed with ssODN, 69.78% of the DNA insertions in LinearDsDNA originated from the donor itself (Fig. S3d), suggesting that dsDNA is much more likely to integrate into DSBs than ssODN. As the HDR efficiency of LinearDsDNA is relatively low (62 alleles, 0.72%)21,27, possibly due to cell death, the frequency of donor template integration (97 alleles, 1.12%) may appear disproportionately high (or even dominant). The LinearDsDNA donor template contained two artificial sequence tags on both ends (Fig. S1b), allowing us to ascertain the integration of the full-length donor sequence. Alignment of inserted DNA to the LinearDsDNA template showed that 12.95% of insertions were near full-length (≥95% coverage) that contained the artificial tags at both termini (Fig. 3a). Furthermore, we found that 6.47% of inserted DNA contained head-to-tail tandem concatemeric donor sequences (Fig. 3b). The integration of tandem-repeat dsDNA donors (concatemer) was commonly found in the generation of transgenic organisms by pronuclear microinjection, and in most cases, the transgenic concatemer was found to be located at a single locus28. Our LinearDsDNA data were consistent with this finding and indicate that the electroporation of linear dsDNA templates with Cas9 RNP can result in frequent targeted integration of full-length donors in a homology-independent manner, similar to homology-independent targeted integration (HITI2) donors but without the need for the complex design.
a Full-length donor integration in Cas9 editing with LinearDsDNA. Two artificial sequences are shown at both ends of the template. b Tandem concatemeric donor integration in Cas9 editing with LinearDsDNA. c The frequencies of large deletion, insertion, and HDR in WT and HiFi Cas9 editing with regular and phosphorylated dsDNA donors. DNA phosphorylation reduces unintended large SVs without compromise in HDR efficiency in both WT and HiFi Cas9 editing. d Full-length donor integration in HiFi Cas9 editing with LinearDsDNA. e Full-length donor integration in WT and HiFi Cas9 editing with orientation-controlled LinearDsDNAphos donor. Reads labeled in red indicate the correct orientation.
To determine if the exposed DNA ends of LinearDsDNA were necessary for donor integration, we examined LgIns using CircularDsDNA as a donor. When using a large CircularDsDNA donor (3197 bp), the frequency of LgIn decreased from 1.16% in LinearDsDNA to 0.14% (Table 1). It is worth noting that the size, copy number, and Cas9 enzyme differences between LinearDsDNA and CircularDsDNA could affect the electroporation and availability of the templates, contributing to the difference in LgIn frequency. Nevertheless, unlike in LinearDsDNA, most (88.89%) inserted DNA in CircularDsDNA originated from the host genome. Two (11.11%) insertions (~1337 bp) were from sequences in the plasmid backbone rather than the homology arms (Fig. S7a). Plasmid insertions have also been observed in other studies29,30. It suggested that the integration of donor DNA can be independent of the HDR DNA repair pathway. Since the two insertions aligned to the same position in the donor, our data indicated that unintended integration of long foreign DNA could be amplified by cell proliferation and/or gene conversion.
HITI is used for the targeted integration of transgenes through the NHEJ pathway2. To test if the unintended non-HR donor integration had an impact on HITI gene editing, we constructed a CircularHITI donor to insert a GFP reporter gene in the EPOR locus (Fig. S1b). CircularHITI showed a similar frequency of SVs as CircularDsDNA (Table 1). The design of HITI should facilitate targeted integration of the GFP gene. However, we detected a 1322-bp insertion of the plasmid backbone sequence in CircularHITI and three truncated insertions of the HITI target (i.e., the GFP reporter, Fig. S7b).
Donor DNA phosphorylation reduces unintended SVs without compromising HDR efficiency
The LinearDsDNA donor used in this study was a PCR amplicon without the 5′ phosphorylation. To further confirm the prevalent integration of linear dsDNA donors and test if the insertion frequency is affected by DNA end modifications or different Cas9 proteins, we used 5’ phosphorylated LinearDsDNA (referred to as LinearDsDNAphos) with WT S.p. Cas9 editing and repeated LinearDsDNA and LinearDsDNAphos editing with HiFi Cas9.
LinearDsDNAphos and LinearDsDNA showed similar editing efficiency in the installation of the desired point mutation with either WT or HiFi Cas9 (Fig. 3c), suggesting that HDR was not affected by DNA phosphorylation. Additionally, compared to WT Cas9, HiFi Cas9 resulted in fewer unintended large SVs but also lower HDR efficiencies. The 5´ phosphorylation is necessary for the ligation of two DNA ends. However, we found that DNA phosphorylation (LinearDsDNAphos) decreased the frequencies of LgIns both in WT and HiFi Cas9 editing (Table 1 and Fig. 3c). We examined the constitution of LgIns and observed that, despite an overall decrease in LgIns, LinearDsDNAphos elevated the proportion of donor integration within LnIns (from 69.68 to 81.48% in WT Cas9 and from 82.98 to 84.44% in HiFi Cas9, Table S1). Consistent with LinearDsDNA and WT Cas9 editing, foreign donor integrations were prevalent regardless of donor phosphorylation and the type of Cas9. 40.91, 46.15, and 26.32% of donor insertions were near full-length in WT LinearDsDNAphos, HiFi LinearDsDNA, and HiFi LinearDsDNAphos, respectively (Fig. 3d, e).
The design of LinearDsDNAphos donor contained two Cas9-target-sequence segments (Fig. S1b, S8) that mimicked the HITI strategy to promote donor insertion in a specific orientation but without the need for a Cas9 cleavage. The inserted DNA alignment showed that such orientation control was not very efficient in LinearDsDNAphos editing (Fig. 3e). Since most integrated donor sequences were truncated, it is possible that the electroporated donor DNA had been processed by nucleases before being ligated to the DSB, thus making the orientation control strategy ineffective.
Besides LgIns, our data showed that the frequencies of all types of unintended large SVs decreased when deploying phosphorylated donors (Fig. 3c). For instance, the frequency of large deletions decreased from 8.14 to 5.67% for WT Cas9 and from 2.53 to 1.76% for HiFi Cas9. Previous studies showed that MMEJ plays a main role in Cas9-induced large deletions17. We performed a microhomology analysis of the large deletion breakpoints and found that the frequency of microhomology-mediated deletions associated with phosphorylation donors reduced from 55.32 to 50.89% for WT Cas9 and from 53.98 to 46.56% for HiFi Cas9. To make sure the SV reduction is due to donor phosphorylation rather than the homology between the host genome and donor sequences, we further performed WT Cas9 editing with a phosphorylated GFP donor (LinearGFPphos) of similar size but without homologies (Fig. S1b). LinearGFPphos showed a similar reduction in SVs as LinearDsDNAphos (Table 1). Given that donor phosphorylation reduced unintended SVs without affecting HDR efficiency with both WT and HiFi Cas9, our data suggest donor phosphorylation as a good practice in template-directed Cas9 editing.
Discussion
As a cutting-edge gene-editing tool, CRISPR-Cas9 has been extensively applied to academic, biotechnological, and clinical applications6. Intensive efforts have been taken in investigating the potential side effects of Cas9 editing to ensure its safe utilization. Initial attention was drawn to on-target and off-target small mutations31. Subsequent studies reported unexpected large deletions and chromosome rearrangements8,9,10,11,32. However, the exploration of LgIns has been limited, primarily due to technical constraints.
This study presents the first detailed analysis of unintended LgIns induced by Cas9 editing, occurring on average at an allele frequency of 0.7% (range 0.1–1.61%). Although the reported values are derived from a single dataset without biological replicates due to the high cost of IDMseq with ONT sequencing, our findings from 13 independent Cas9-editing experiments in six gene loci reveal that LgIns occur at low frequencies, making them challenging to detect using conventional sequencing methods. Nevertheless, the discovery that approximately 7 out of every 1000 edited cells carry hitherto overlooked LgIns of foreign sequences (some with coding and regulatory capacities) raises potentially serious concerns, particularly in clinical applications. Among the insertions, REs such as LINE, LTR, and SINE account for a high frequency. RE frequencies and 3D genome organization analysis suggest LgIns originate from randomly acquired genomic fragments by DNA repair mechanisms. Unintended insertions of such elements may entail risks, including those well-studied in cancer, aging, and aging-associated diseases33,34,35. Besides retrotransposon elements, regulatory elements–including TSS, enhancers, and promoters–intriguingly represent a large portion of insertions. The integration of these regulatory elements may recruit transcription factors and initiate novel transcriptions that could lead to unpredictable and deleterious outcomes. Additionally, the discovery of exonic insertions, including the oncogene TNS4 and mitochondrial gene MT-ND5, albeit only in a small fraction of the alleles, underscores potential risks in cancer development. In addition to affecting genetic information, the insertion of exogenous DNA has been shown to influence epigenetic information, such as inducing de novo DNA methylation36—a risk that warrants further investigation.
Cas9 editing coupled with donor templates is an indispensable and extensively employed method for generating specific genome editing outcomes and correcting genetic disorders in clinical applications2,37,38,39,40,41, but it carries risks like unintended mutations10. iPSCs offer a unique advantage by allowing for the expansion of a single edited clone, ensuring only those without detectable mutations are selected for differentiation into functional cells. Similarly, primary cells edited for therapy can be analyzed for mutations to identify potential risks early and minimize complications. IDMseq provides an effective tool for studying the risks associated with Cas9 editing, offering a comprehensive analysis of unintended on-target mutations10. This study compares the editing efficiencies and byproducts of ssODN, LinearDsDNA, and CircularDsDNA donors in the installation of point mutation. It shows that ssODN exhibits high efficiency and few byproducts. The high efficiency of ssODN in installing the desired mutation was in agreement with other studies10,25. Besides, the editing efficiency of different templates is potentially affected by the copy number of donors used in experiments. Due to the limitation of DNA quantity in electroporation (high DNA mass leads to severe cell death, Fig. S2), the copy number of CircularDsDNA (0.97 pmol) is significantly lower than that of LinearDsDNA (5 pmol) and ssODN (10.8 pmol) but in keeping with typical amounts used in the literature42. The low copy number of CircularDsDNA could limit the donor templates available for HDR of DSBs, thus resulting in the absence of the desired point mutation. Furthermore, our data showed a significant proportion of unintended full-length and even duplex donor integration in LinearDsDNA-templated Cas9 editing, demonstrating the need for extra caution when using linear dsDNA templates.
While donor DNA end modifications, such as phosphorylation, have been studied in improving CRISPR precise genome editing43,44, their influence on large SVs has not been explored. This study demonstrates that phosphorylated dsDNA donor templates reduce large insertions and deletions without compromising HDR efficiency in both WT and HiFi Cas9 editing. A plausible explanation is that phosphorylated DNA ends facilitate prompt resolution of DSBs through the fast-acting NHEJ21, reducing the stimulation of other error-prone DNA repair pathways, thereby minimizing large SVs. Our data supports the employment of donor DNA phosphorylation in template-directed Cas9 editing for enhanced safety. Taken all together, our study addresses a ubiquitous and overlooked risk of unintended LgIns in DSB-based Cas9 editing, contributing valuable insights into the safety of Cas9 editing tools.
Methods
Cell lines
The H1 hESC line was purchased from WiCell Institute and cultured in Essential 8 (E8) medium (Thermo Fisher, Cat# A1517001) in rhLaminin-521 (Thermo Fisher, Cat# A29249) coated wells with medium change daily. Cells were maintained at 37 °C in a humidified incubator with sea-level air enriched with 5% CO2.
Donor templates
The ssODN was purchased from IDT, Inc. The LinearDsDNA and LinearDsDNAphos were amplified by PCR and purified using AMPure XP magnetic beads (Beckman Colter, Cat# A63881). The CircularDsDNA was a reconstructed plasmid by inserting LinearDsDNA into a pGEM® -T Easy vector (Promega, Cat# A1360). The sequences of all donors were shown in Table S3.
CRISPR-Cas9 editing
Alt-R S.p. Cas9 (wildtype) (Cat# 1081059) and Alt-R S.p. HiFi Cas9 (Cat# 1081061) were purchased from IDT, Inc. The sgRNA (Table S3) was synthesized using a MEGAshortscript™ T7 Transcription Kit (Thermo Fisher, Cat# AM154). The Cas9/sgRNA ribonucleoprotein (RNP) was premixed and aliquoted at 50 pmol per electroporation. H1 hESCs were harvested at an ~70% confluency and distributed into 1.5 ml tubes at 200,000 cells per tube for a single electroporation. The Neon™ Transfection System (Thermo Fisher, MPK5000) was used to perform the electroporation of Cas9/sgRNA and donor DNA with a setting of 1600 volts, 10 ms width, and 3 pulses. The cells were immediately seeded in 24-well plates after electroporation with 500 µl E8 medium containing 10 µM ROCK inhibitor Y27632 (Abcam, Cat# ab120129). After 48 h post-electroporation, the cells were harvested for genomic DNA extraction.
IDMseq
The sequencing of Cas9-edited loci was performed with IDMseq10. In brief, IDMseq utilized a UMI oligo to label targeted molecules (genes of interest) with UMIs. The UMI oligo (CATCTTACGATTACGCCAACCACTGCNNNNNTGNNNNNCCCATTTCCAGGCCAGATCCCTC) contained three parts: a 3′ gene-specific sequence, a UMI sequence, and a 5′ universal primer sequence. The 3′ gene-specific sequence was used to specifically target the edited region. The middle UMI sequence consisted of multiple random bases (denoted by Ns), enabling the label of original molecules. The 5’ universal primer sequence worked with a gene-specific reverse primer to uniformly amplify all UMI-labeled DNA molecules for ONT sequencing.
The genomic DNA of edited cells was extracted using a Blood & Tissue Kit (Qiagen, Cat# 69506). The UMI labeling was performed following the published protocol10. Briefly, unique UMI was added to the EPOR target sequence by one round of UMI oligo priming (CATCTTACGATTACGCCAACCACTGCNNNNNTGNNNNNCCCATTTCCAGGCCAGATCCCTC) in a 50 µl PCR reaction including 100 ng genomic DNA, 1 µM UMI primer, 25 µl 2X Platinum SuperFi PCR Master Mix (Thermo Fisher, Cat# 12358010), following the program: initial denaturation at 98 °C for 70 s, gradient annealing from 70 to 65 °C with 1 °C/5 s ramp rate, extension at 72 °C for 7 min, and hold at 4 °C. The UMI-labeled DNA was purified by 0.8x AMPure XP beads to remove UMI oligos, then mixed with a universal primer (CATCTTACGATTACGCCAACCACTGC) and an EPOR gene-specific reverse primer (TAACCTCCCGGACCCCAAGTTCG) and amplified using PrimeSTAR GXL DNA polymerase (Takara, Cat# R050A) following the program: initial denaturation at 95 °C for 2 min, 98 °C for 10 s, 68 °C for 7 min for 30 cycles, 68 °C for 5 min, and hold at 4 °C. The amplicon was further purified and used for ONT library preparation.
ONT sequencing
The Cas9 targeted sequencings of KCNQ1, PEG10, and SNRPN loci (Table S3) were performed using Cas9 Sequencing Kit (Oxford Nanopore Technologies, Cat# SQK-CS9109). In brief, gRNAs were designed to target the loci of interest. H1 ESCs were electroporated with the Cas9 RNPs targeting the loci of interest. Following Cas9 editing in the cells, genomic DNA was extracted to be used with the ONT Cas9 Sequencing Kit. Cas9 nuclease was used to selectively cleave genomic DNA at sites flanking the targeted loci following the manufacturer’s instructions. The digested DNA fragments containing the target regions were then captured by ligating sequencing adapters and prepared for sequencing using the ONT Cas9 sequencing kit, following the manufacturer’s protocol.
The library preparation of IDMseq was done using the ligation sequencing kit (Oxford Nanopore Technologies, Cat# SQK-LSK109) following its standard protocol. Each sample was sequenced using one ONT MinION R9.4.1 flow cell. The reads were base called using Guppy basecaller (v6.1.7) with the super-accuracy mode.
Bioinformatics analysis
All sequencing data were analyzed by VAULT as described previously10. In brief, the UMI oligo sequence, fastq file and reference amplicon sequence were provided to the algorithm. VAULT will extract mappable reads followed by extraction of UMI sequences from reads. Reads will then be grouped based on their UMI sequences, and used for parallel analysis of SNVs and SVs. The vault summarize command was used to generate the analysis summary. The circle plots of LgIns were generated by circos (http://circos.ca). The histogram plots were generated by ggplot in R.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The source data underlying Fig. 1 are provided in Supplementary Data 1. Raw sequencing data are available in the SRA database (accession ID: PRJNA1096936), which are accessible with the following link: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1096936.
References
Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Hum. Gene Ther. 26, A3–A3 (2015).
Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144–149 (2016).
Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nat. Commun. 5, 5560 (2014).
Zheng, Y. et al. Precise genome-editing in human diseases: mechanisms, strategies and applications. Signal Transduct. Target Ther. 9, 47 (2024).
Sheridan, C. The world’s first CRISPR therapy is approved: who will receive it? Nat. Biotechnol. 42, 3–4 (2024).
Wang, J. Y. & Doudna, J. A. CRISPR technology: a decade of genome editing is only the beginning. Science 379, eadd8643 (2023).
Zhang, X. H., Tee, L. Y., Wang, X. G., Huang, Q. S. & Yang, S. H. Off-target effects in CRISPR/Cas9-mediated genome engineering. Mol. Ther. Nucleic Acids 4, e264 (2015).
Kosicki, M., Tomberg, K. & Bradley, A. Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765–771 (2018).
Liu, M. et al. Global detection of DNA repair outcomes induced by CRISPR-Cas9. Nucleic Acids Res. https://doi.org/10.1093/nar/gkab686 (2021).
Bi, C. et al. Long-read individual-molecule sequencing reveals CRISPR-induced genetic heterogeneity in human ESCs. Genome Biol. 21, 213 (2020).
Alanis-Lobato, G. et al. Frequent loss of heterozygosity in CRISPR-Cas9-edited early human embryos. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.2004832117 (2021).
Hanlon, K. S. et al. High levels of AAV vector integration into CRISPR-induced DNA breaks. Nat. Commun. 10, 4439 (2019).
Suchy, F. P. et al. Genome engineering with Cas9 and AAV repair templates generates frequent concatemeric insertions of viral vectors. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02171-w (2024).
Bi, C. et al. Quantitative haplotype-resolved analysis of mitochondrial DNA heteroplasmy in Human single oocytes, blastoids, and pluripotent stem cells. Nucleic Acids Res. 51, 3793–3805 (2023).
Bi, C. et al. Single-cell individual full-length mtDNA sequencing by iMiGseq uncovers unexpected heteroplasmy shifts in mtDNA editing. Nucleic Acids Res. 51, e48 (2023).
Zhang, Y., Chandrasekaran, A. P., Bi, C. & Li, M. Quantification of genetic heterogeneity using long-read targeted individual DNA molecule sequencing. Curr. Protoc. 3, e888 (2023).
Yuan, B. et al. Modulation of the microhomology-mediated end joining pathway suppresses large deletions and enhances homology-directed repair following CRISPR-Cas9-induced DNA breaks. BMC Biol. 22, 101 (2024).
Park, S. H. et al. Comprehensive analysis and accurate quantification of unintended large gene modifications induced by CRISPR-Cas9 gene editing. Sci. Adv. 8, eabo7676 (2022).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015).
Owens, D. D. G. et al. Microhomologies are prevalent at Cas9-induced larger deletions. Nucleic Acids Res. 47, 7402–7417 (2019).
Wen, W. et al. Effective control of large deletions after double-strand breaks by homology-directed repair and dsODN insertion. Genome Biol. 22, 236 (2021).
Cordaux, R. & Batzer, M. A. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 10, 691–703 (2009).
Muharram, G. et al. Tensin-4-dependent MET stabilization is essential for survival and proliferation in carcinoma cells. Dev. Cell 29, 629–630 (2014).
Demircioglu, D. et al. A Pan-cancer transcriptome analysis reveals pervasive regulation through alternative promoters. Cell 178, 1465–1477.e1417 (2019).
Codner, G. F. et al. Application of long single-stranded DNA donors in genome editing: generation and validation of mouse mutants. BMC Biol. https://doi.org/10.1186/s12915-018-0530-7 (2018).
Bai, H. et al. CRISPR/Cas9-mediated precise genome modification by a long ssDNA template in zebrafish. BMC Genomics 21, 67 (2020).
Weisheit, I. et al. Detection of deleterious on-target effects after HDR-mediated CRISPR editing. Cell Rep. 31, 107689 (2020).
Smirnov, A. & Battulin, N. Concatenation of transgenic DNA: random or orchestrated? Genes https://doi.org/10.3390/genes12121969 (2021).
Erbs, V. et al. Increased on-target rate and risk of concatemerization after CRISPR-enhanced targeting in ES cells. Genes https://doi.org/10.3390/genes14020401 (2023).
Zhao, J. J. et al. Decoding the complexity of on-target integration: characterizing DNA insertions at the CRISPR-Cas9 targeted locus using nanopore sequencing. BMC Genomics 25, 189 (2024).
Kim, D., Luk, K., Wolfe, S. A. & Kim, J. S. Evaluating and enhancing target specificity of gene-editing nucleases and deaminases. Annu. Rev. Biochem. 88, 191–220 (2019).
Zuccaro, M. V. et al. Allele-specific chromosome removal after Cas9 cleavage in human embryos. Cell 183, 1650–1664.e1615 (2020).
Gorbunova, V. et al. The role of retrotransposable elements in ageing and age-associated diseases. Nature 596, 43–53 (2021).
Mosaddeghi, P., Farahmandnejad, M. & Zarshenas, M. M. The role of transposable elements in aging and cancer. Biogerontology 24, 479–491 (2023).
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Takahashi, Y. et al. Integration of CpG-free DNA induces de novo methylation of CpG islands in pluripotent stem cells. Science 356, 503–508 (2017).
Li, M., Suzuki, K., Kim, N. Y., Liu, G. H. & Izpisua Belmonte, J. C. A cut above the rest: targeted genome editing technologies in human pluripotent stem cells. J. Biol. Chem. 289, 4594–4599 (2014).
Li, M. et al. Efficient correction of hemoglobinopathy-causing mutations by homologous recombination in integration-free patient iPSCs. Cell Res. 21, 1740–1744 (2011).
Liu, G. H. et al. Modelling Fanconi anemia pathogenesis and therapeutics using integration-free patient-derived iPSCs. Nat. Commun. 5, 4330 (2014).
Liu, G. H. et al. Targeted gene correction of laminopathy-associated LMNA mutations in patient-specific iPSCs. Cell Stem Cell 8, 688–694 (2011).
Suzuki, K. et al. Targeted gene correction minimally impacts whole-genome mutational load in human-disease-specific induced pluripotent stem cell clones. Cell Stem Cell 15, 31–36 (2014).
Ran, F. A. et al. Genome engineering using the CRISPR-Cas9 system. Nat. Protoc. 8, 2281–2308 (2013).
Gutierrez-Triana, J. A. et al. Efficient single-copy HDR by 5’ modified long dsDNA donors. Elife https://doi.org/10.7554/eLife.39468 (2018).
Ghanta, K. S. et al. 5’-Modifications improve potency and efficacy of DNA donors for precision genome editing. Elife https://doi.org/10.7554/eLife.72216 (2021).
Acknowledgements
We thank members of the Li laboratory for helpful discussions; Jinna Xu and Doreena Chen for administrative support. The research of the Li laboratory was supported by the KAUST Office of Sponsored Research (OSR), under award numbers BAS/1/1080-01-01. This work was financially supported in part by funding from King Abdullah University of Science and Technology (KAUST)—KAUST Center of Excellence for Smart Health (KCSH), under award number 5932 (ML).
Author information
Authors and Affiliations
Contributions
C.B. and B.Y. performed the majority of the experiments related to sequencing. Y.Z. and M.W. performed the Cas9 enrichment sequencing experiments. Y.T. performed the PIGA sequencing experiment. C.B. performed the bioinformatics analysis. C.B. and M.L. analyzed the data and wrote the manuscript. C.B. and M.L. conceived the study. M.L. supervised the study.
Corresponding author
Ethics declarations
Competing interests
Mo Li is an Editorial Board Member for Communications Biology, but was not involved in the editorial review of, nor the decision to publish this article. The remaining authors declare no competing interests.
Peer review
Peer review information
Communications Biology thanks Xiao-Bing Zhang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Mengtan Xing.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bi, C., Yuan, B., Zhang, Y. et al. Prevalent integration of genomic repetitive and regulatory elements and donor sequences at CRISPR-Cas9-induced breaks. Commun Biol 8, 94 (2025). https://doi.org/10.1038/s42003-025-07539-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s42003-025-07539-5