Samples and clinical characteristics
We collected and sequenced genomes of 33 skin cancers from 21 patients representing 5 out of 8 xeroderma pigmentosum (XP) groups (3 XP-A, 4 XP-C, 2 XP-D, 10 XP-E, and 14 XP-V tumors; Supplementary Table 1). Causative homozygous (n = 12) or compound heterozygous (n = 8) germline variants were identified in 20 patients, 13 of which had known causative germline mutations (Supplementary Table 1), while the 7 others—had novel germline mutations compatible with the diagnosis. The mean tumor purity and sequence coverage were 41% and 40× (30× for normal tissue), respectively. In addition, we sequenced genomes of 6 sporadic cutaneous squamous cell carcinoma samples (SCC). This newly generated data was combined with WGS data from four previously published XP-C tumors12,13, one XP-D14, as well as 25 sporadic cutaneous Squamous Cell Carcinomas (SCC)12,15,16, 8 Basal Cell Carcinomas15 (BCC) and 113 Melanomas17 (MEL) from individuals not affected with XP. The resulting cohort of XP tumors included 17 BCCs, 15 SCCs, five melanomas, and one angiosarcoma. The mean age at biopsy in XP-cohort was 33 years old (ranging from 25 years old in the XP-C group to 48 years old in the XP-V group) while in sporadic skin cancer group it was 65 years old (Table 1, Supplementary Table 1).
XP groups demonstrate different mutation burden and mutation profiles
We assessed the Tumor Mutation Burden (TMB) and mutation profiles of skin cancer genomes from 5 sequenced XP groups and compared them with the three types of sporadic skin cancer including BCC, SCC and MEL (Fig. 1a). The mean TMB of single base substitutions (SBS) was significantly higher in 3 XP groups: XP-E (350 mut/Mb, p = 0.0241), XP-V (248 mut/Mb, P = 0.0014) and XP-C skin cancers (162 mut/Mb, P = 0.0220), than the dataset weighted average (130 mut/Mb, global P < 2.2e−16; Kruskal–Wallis H test; Fig. 1a). We also observed a strong difference in the TMB and the proportion of CC > TT double base substitutions (DBS) characteristic of UV-induced mutagenesis between the different XP groups and sporadic cancers (Fig. 1a). The highest proportion of CC > TT DBS from UV-induced SBS in pyrimidine dimers (C > T in YpC or CpY contexts; Y denotes a pyrimidine) was observed in XP-C and XP-D tumors (0.2 and 0.17, respectively), which was 6 times higher than in sporadic skin cancers (0.03, p = 4.7e−08, Mann–Whitney U test, two-sided).
a Tumor mutation burden of all single base substitutions (SBS; left panel), double base substitutions (CC > TT; middle panel) per group and a proportion of CC > TT DBS relative to C > T SBS in pyrimidine context (right panel). All the cancer types were combined together per XP group. P-values from nonparametric ANOVA are indicated (Kruskal-Wallis test for global P-value estimation and Mann–Whitney U test, two-sided for individual groups P-values). Boxes depict the interquartile range (25–75% percentile), lines—the median, whiskers—1.5× the IQR below the first quartile and above the third quartile. Source data are provided as a Source Data file. b Trinucleotide-context mutation profiles of SBS (left panel) and tetranucleotide-context mutation profiles of CC > TT DBS (right panel) per group. Data are presented as mean values +/− SEM. c Multidimensional scaling (MDS) plot based on the Cosine similarity distance between the SBS trinucleotide-context mutation profiles of the samples. d MDS plot based on the Cosine similarity distance between the trinucleotide-context mutation profiles of the samples using only C > T mutations with an adjacent pyrimidine (YC > YT or CY > TY), the typical UV mutation context. e MDS plot based on the Cosine similarity distance between the tetranucleotide-context mutation profiles of the samples using only CC > TT double base substitutions. f Mean Cosine dissimilarity (1-Cosine distance) between original and reconstructed trinucleotide-context mutation profiles using only SBS7a/b/c/d COSMIC mutation signatures for all SBS (upper panel) and C > T mutations with adjacent pyrimidine only (lower panel). Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. Sample size for all the panels (tumors): n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V.
The mutation profiles of skin cancers in all XP groups were dominated by C > T substitutions at pyrimidine dimers, as also found in sporadic skin cancers. However, some XP groups demonstrated marked differences for C > T mutations in specific contexts, such as enrichment at TCA in XP-E, TCW in XP-C, or NCY in XP-D (where W denotes A or T; N: A, C, G, or T; Y: C or T). Moreover, in XP-V skin cancers, we report abundant mutations, namely C:G > A:T, T:A > A:T, and T:A > C:G, which were not previously seen to a significant degree in skin cancer (Fig. 1b, Supplementary Fig. 1). XP tumors formed clusters by XP group, which were non-overlapping with the cluster of sporadic skin cancers based on SBS mutation profiles and multidimensional scaling analysis (MDS; Fig. 1c–e; Supplementary Figs. 2, 3). XP-V, XP-C, and XP-A clusters were located distantly, while the XP-E / XP-D cluster was closer to the cluster of sporadic skin cancers.
Among 78 COSMIC mutation signatures18 (v3.2) extracted from the pan-cancer dataset, four mutation signatures (SBS7a/b/c/d) are associated with UV irradiation, and combination of SBS7a and SBS7b usually explain the majority of mutations in sporadic melanomas18. We investigated whether these signatures could explain the observed mutation profiles in XP skin cancer with an accuracy comparable to sporadic skin cancers. For that, we compared observed and reconstructed mutation profiles for each sample in our cohort. The mean Cosine dissimilarity distance was small for sporadic skin cancers (0.004) but increased drastically for all the XP groups (0.16) and particularly for XP-C (0.237), XP-A (0.1957), and XP-V (0.222, Fig. 1f, Supplementary Fig. 4) indicating that mutational profiles in XP skin cancer cannot be optimally reconstructed with the known UV mutational signatures.
Nucleotide excision repair efficiency determines mutation load distribution along the genome
Strong heterogeneity in the mutation rate across the genome is an important fundamental feature of mutagenesis, which has several clinical implications, for example, the discovery of cancer driver genes. We investigated the distribution of typical UV mutations (YC > YT or CY > TY) in XP and sporadic skin cancers in relation to replication timing (RT), active and inactive topologically associated domains (TAD), and markers of chromatin states. These analyses revealed a major role for NER in shaping the heterogeneity of local rates of UV-induced mutations across the genome. A maximal 5.2-fold difference was observed between the earliest and the latest replicating bins in sporadic skin cancers (average for BCC, cSCC, and MEL) with a monotonal decrease of mutation load from late to early replicating genomic regions (Fig. 2a). This effect was much weaker in GG-NER deficient XP-C genomes (2.4-fold) and almost disappeared in GG-NER and TC-NER deficient XP-A (1.5-fold) and XP-D (0.99-fold) genomes (Fig. 2a). Interestingly, the distribution of UV-induced SBS by RT in XP-E and XP-V genomes was not very different from sporadic skin cancer genomes, 4.6-fold and 5.4-fold, respectively.
a Fraction of C > T mutations from pyrimidine dimers in genomic regions grouped in 8 equal size bins by replication timing (RT) for XP groups and sporadic skin cancers. The box contains the slope values from linear regressions across 8 RT bins. Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. b Fraction of C > T mutations from pyrimidine dimers per group in 1 Mb regions centered at the boundary between active and inactive topologically associated domains (split into two bins each). Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. c Fraction of C > T mutations from pyrimidine dimers per group across different chromatin states (R – repressed, A and A2 – active, H – heterochromatin, I – inactive). Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. d Fractions of C > T mutations from pyrimidine dimers in intergenic regions (left panel), on the untranscribed (middle panel) and transcribed (right panel) DNA strands of gene regions grouped in 5 equal size bins by replication timing (RT) for XP groups and sporadic skin cancers. The boxes contain the slope values from linear regressions across 5 RT bins. I intergenic regions, NTR untranscribed strand of genes, TR transcribed strand of genes. Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. Sample size for all the panels (tumors): n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V.
It has been recently shown that TAD boundaries between active and inactive chromatin domains strongly delineate the transition between regions with low and high mutation load in different human cancers19. Indeed, in our cohort, we found a 2.2-fold difference in mutation load between active and inactive TADs in sporadic cancers, but it was noticeably decreased in XP-C (1.4-fold) cancers and was virtually absent in XP-A (1.05-fold) and XP-D (1.09-fold; Fig. 2b). Similarly, the mutation load in XP-A and XP-D tumors was independent of chromatin states, the XP-C group demonstrated a mild dependence, while the XP-E and the XP-V groups were not different from sporadic cancers (Fig. 2c).
CPD and 6-4PP DNA lesions occur on pyrimidine bases, which enabled us to identify the strand on which the lesion underlying a UV-induced mutation occurred. In order to separately investigate the genomic targets of GG-NER and TC-NER, we split the genome into intergenic, transcribed, and untranscribed strands of genic regions. A strong decrease of mutation rate in the early RT regions in groups proficient in GG-NER (sporadic cancers and XP-V), and surprisingly in GG-NER deficient XP-E, was observed in intergenic regions and untranscribed strands of genes. Whereas XP-A and XP-D which lack both GG-NER and TC-NER, had flat slopes compatible with the lack of repair in the open chromatin of early RT regions (Fig. 2d, Supplementary Fig. 5). XP-C samples with the functional TC-NER and fully abrogated GG-NER demonstrated lack of repair on untranscribed gene strands and in the intergenic regions, but they were proficient in repair of the transcribed strands of genes (Fig. 2d, Supplementary Fig. 5).
Transcriptional bias is different between the XP groups
TC-NER removes UV-induced bulky DNA lesions on the transcribed strand of expressed genes more efficiently than GG-NER on the untranscribed strand resulting in a decrease of mutations on the transcribed versus untranscribed strand, a phenomenon called transcriptional bias (TRB)20. In skin tumors with proficient NER, the TRB ranged between 1.3 and 1.6-fold for sporadic cancers and was 1.7 in XP-V. In the GG-NER-deficient TC-NER-proficient groups, TRB was particularly high, ranging between 1.77-fold (XP-E) and 2.42-fold (XP-C), which is compatible with defects in the repair of the untranscribed strand. In contrast, in XP-A and XP-D groups with defects of both TC-NER and GG-NER TRB was minimal or absent: 1.17-fold and 0.97-fold, respectively (Fig. 3a).
a The transcriptional bias (TRB) per group (ratio between untranscribed and transcribed strand) for C > T mutations with adjacent pyrimidines for XP groups and sporadic skin cancers. P-values from nonparametric ANOVA are indicated. Boxes depict the interquartile range (25–75% percentile), lines—the median, whiskers—1.5× the IQR below the first quartile and above the third quartile. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V (tumors). Source data are provided as a Source Data file. b Fractions of C > T mutations with adjacent pyrimidines separated by strands in the TES-centered 100 kb region (binned by 10 kb intervals). Data are presented as mean values +/− SEM. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V (tumors). Source data are provided as a Source Data file. c DNA context-normalized XR-seq density from XP-C cell line on untranscribed and transcribed gene strands in the TES-centered 100 kb region (binned by 10 kb intervals; left panel, n = 1). DNA context-normalized XR-seq density from XP-C cell line by replication timing for the transcribed and untranscribed DNA strands of genes and intergenic regions. I intergenic regions, NTR untranscribed strand of genes, TR transcribed strand of genes (right panel, n = 1). d Correlation between XR-seq intensity from XP-C cell line and nascent RNA-seq for genic regions (left panel, n = 1) and intergenic regions 50 kb downstream of TES (right panel, n = 1). Pearson’s r correlation coefficients and P values are indicated. e Transcriptional bias of C:G > T:A mutations on intergenic regions of XP-C tumors depending on the XR-seq intensity of XP-C cell line. SEM intervals are indicated, n = 8 tumors. f Relative mutation rate of C:G > T:A mutations in intergenic regions of XP-C tumors (n = 8) depending on the XR-seq intensity in XP-C cell line. Data are presented as mean values +/− SEM.
TC-NER removes DNA lesions downstream of genes and influences intergenic mutation load
Since early RT regions are particularly gene-rich21, we hypothesized that in GG-NER deficient, but TC-NER proficient XP groups (XP-C, XP-E), decreased mutation load in early RT intergenic regions might be associated with the TC-NER activity beyond gene boundaries. Indeed, in GG-NER deficient XP-C tumors, we revealed a statistically significant TRB up to 40 kb downstream of the furthest annotated transcriptional end sites (TES) of genes with decreased mutation frequency on the transcribed strand of nearby genes (Fig. 3b, Supplementary Fig. 6). The same effect was observed in XP-E and even in NER proficient skin cancers although with a lower magnitude (but significant for sporadic melanoma with the large sample size, Supplementary Fig. 6). As expected, we did not observe TRB downstream of genes in XP-A and XP-D samples being deficient for both TC-NER and GG-NER (Fig. 3b, Supplementary Fig. 6). To validate TC-NER activity downstream of gene TES, we used previously published XR-seq data from XPC-deficient cell lines22,23. It is expected that in XPC-deficient cells, XR-seq data, representing the sequencing of lesion-containing DNA fragments excised by NER22, is produced exclusively by TC-NER. An XR-seq signal was observed up to 40 kb downstream of TES on a transcribed strand of a nearby gene, mirroring mutation asymmetry in the same regions in XP-C tumors (Fig. 3c, Supplementary Fig. 7) and was well correlated with the transcriptional intensity of nascent RNA, which was retrieved from an independent study24 (Fig. 3d). This suggests that, in some cases, the RNA polymerase might continue transcription after TES and recruit TC-NER at lesion sites. We identified XR-seq signal in 21% of the cumulative length of intergenic regions and 14%—of untranscribed strands of genes in XPC-deficient cell line22, suggesting ubiquitous extended TC-NER activity. Analysis of transcriptional bias and relative mutation rate in intergenic regions of XP-C tumors (Fig. 3e, f) revealed strong dependence on the intensity of XR-seq outside the annotated genic regions. This extended TC-NER activity outside of the transcribed strand of genes is especially strong in early replicating regions with a high density of active genes (Fig. 3c). It may explain the decrease of the mutation density in intergenic regions and on the untranscribed strands of genes in early replicating genomic regions of GG-NER deficient XP-C samples (Fig. 2d, Supplementary Fig. 5).
XP-E demonstrates reduced GG-NER activity
The sensors of UV-induced DNA lesions in GG-NER, XPC, and DDB2 (XPE) are thought to work in tandem when DDB2 binds directly to a lesion and facilitates recruitment of XPC, which in turn initializes the repair process with the TFIIH complex25. We decided to compare the features of UV-induced mutagenesis in XP-E resulting from the loss of DDB2 with XP-C and sporadic tumors.
MDS plot based on SBS mutational profiles (Fig. 4a) and hierarchical clustering (Supplementary Fig. 8a) revealed three clusters corresponding to XP-C, XP-E, and sporadic tumors. At the same time, the proportion of CC > TT DBS was much increased in XP-C (0.21) versus sporadic cSCC (0.064), but significantly decreased in XP-E cSCC (0.034, p = 0.0003; Mann–Whitney U test, two-sided), confirming qualitative differences of mutagenesis in XP-E. Unlike XP-C, the distribution of the mutational load in intergenic and untranscribed strand gene regions by RT in XP-E was very close to that of sporadic cSCC, suggesting that repair in early RT regions was functional in XP-E (Fig. 2d, Supplementary Fig. 5). Similarly, the MDS plot and hierarchical clustering based on the local mutation load in 2684 1Mb-long intervals along the genome, revealed no strong difference between XP-E and sporadic samples while XP-C samples all grouped together irrespectively the tumor type (Fig. 4b, Supplementary Fig. 8b). For example, a single XP-E melanoma sample clustered within sporadic melanomas and majority of XP-E nonmelanoma skin cancers—with sporadic samples of the corresponding cancer types, while XP-C samples grouped together separately from sporadic and XP-E tumors (Fig. 4b, Supplementary Fig. 8b).
a Multidimensional scaling (MDS) plot based on the Cosine similarity distance between the SBS trinucleotide-context mutation profiles of the samples (Dimensions 1 and 2—left panel, Dimensions 1 and 3—right panel). Colors encode groups of the samples and shapes encode types of cancers. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E and n = 8 for XP-C (tumors). b PCA plot based on the density of mutations in 2684 1Mb-long windows along the genome (only for samples with more than 50k mutations belong to sporadic, XP-C and XP-E groups). Colors encode groups of the samples and shapes encode types of cancers. n = 4 for BCC, n = 83 for MEL, n = 26 for SCC, n = 8 for XP-C and n = 10 for XP-E (tumors). c The transcriptional bias (TRB; ratio between untranscribed and transcribed strand mutation number) for C > T mutations from pyrimidine dimers in genes grouped in 6 bins by gene expression level. Only cutaneous SCC tumors were used for XP-C and XP-E groups. Data are presented as mean values +/− SEM. n = 31 for SCC, n = 5 for XP-C and n = 7 for XP-E (tumors). Source data are provided as a Source Data file. d Fractions of C > T mutations from pyrimidine dimers separated by strands in the TSS-centered 100 kb region (binned by 10 kb intervals). Blue line—untranscribed strand for purines or transcribed for pyrimidines, red line—transcribed strand for purines or untranscribed for pyrimidines. Data are presented as mean values +/− SEM. n = 31 for SCC, n = 5 for XP-C and n = 7 for XP-E (tumors). Source data are provided as a Source Data file.
The XP-E group demonstrated a strong TRB (1.77-fold), which was intermediate between sporadic cSCC (1.33) and the XP-C group (2.47) (Fig. 3a, Fig. 4c, d). Given that TC-NER is functional in XP-E, XP-C, and sporadic samples, and assuming that GG-NER is fully abrogated in XP-C, we can estimate the relative efficiency of GG-NER in XP-E tumors. Providing all else is equal, GG-NER is 64% less efficient in XP-E than in sporadic cancers.
To provide a more detailed view of the mutation difference between XP-E, XP-C, and sporadic tumors, we compared the association of mutation load in each group with the core epigenetic marks from primary keratinocyte cell line26 using only cSCC samples (Supplementary Fig. 9). Unlike XP-C, XP-E tumors did not show strong and significant differences from sporadic cSCC in the dependence of mutagenesis on the majority of epigenetic covariates except for the histone modification marks H3K36me3, H3K27ac and H3K9me3 on the transcribed strand of gene regions (Supplementary Fig. 9).
Taking these observations together, we can speculate that in XP-E tumors, there is a residual activity of GG-NER associated with the ability of XPC to find a fraction of DNA lesions and initiate NER. This correlates with the clinical observation that XP-E patients develop less and later skin tumors than XP-C patients.
Polymerase η deficiency causes a specific mutation profile in skin cancers
The analysis of XP-V skin cancers revealed that on average of 27% (15–42%) of SBS were represented by C:G > A:T mutations with a highly specific 3-nt context (NCA) and a strong and homogeneous TRB (Figs. 5a, 1b, Supplementary Fig. 1). Similar mutation contexts and a TRB was observed for a part of T:A > A:T mutations, which represented 8.7% of SBS. In sporadic skin cancers, C:G > A:T and T:A > A:T mutations represented only 2.5% and 4.6%, respectively, and had different broad 3-nt contexts without a strong TRB (Fig. 1b, Supplementary Fig. 1). Enrichment of these types of mutations in XP-V suggests that they might originate from lesions that are bypassed by polymerase η in an error-free manner in sporadic skin cancer, but XP-V cells have to use an alternative polymerase(s) to bypass these lesions.
a Trinucleotide-context mutation profile of genomic SBS (upper panel) and genic SBS (lower panel) separated by transcribed (TR) and untranscribed (NT) strands in XP-V tumors. Blue bars— untranscribed strand for purines or transcribed for pyrimidines, red bars—transcribed strand for purines or untranscribed for pyrimidines. Data are presented as mean values +/− SEM. n = 14 tumors. b Fractions of C > A mutations separated by gene strands in the TSS-centered 100 kb region of XP-V tumors (binned by 10 kb intervals). Blue—untranscribed strand for mutations from purines and transcribed strand for mutations from pyrimidines; red— transcribed strand for mutations from purines and untranscribed strand for mutations from pyrimidines. Data are presented as mean values +/− SEM. n = 14 tumors. c The transcriptional bias (ratio between transcribed and untranscribed strand) for C > A and C > T mutations per bin of gene expression level (only XP-V samples represented by BCC, n = 11 tumors). Data are presented as mean values +/− SEM. d Trinucleotide-context mutation profiles of SBS separated by strands in XP-V tumors for C > A and T > A mutations. Data are presented as mean values +/− SEM, n = 14 tumors. e Mutations per megabase in the POLH wt and POLH-KO clones in nontreated cells (NT, n = 1 per cell line independent biological replicate), treated with KbrO3 (n = 1 per cell line independent biological replicate), UV-A (n = 3 per cell line independent biological replicates) and UV-C (n = 3 independent cell clones per cell line). Welch two sample t-test, two-sided. Data are presented as mean values +/− SEM. Source data are provided as a Source Data file. f Mutational specificity of the TG > TT mutations in XP-V tumors and POLH-KO UV-A- and UV-C-treated cell lines. X-axis: log2-transformed transcriptional bias of the TG > TT mutations per genome. Y-axis: Fraction of the mutations in the TG > TT context from the total number of C:G > A:T substitutions per genome. POLH-KO and POLH-wt clones are specifically indicated with their corresponding treatment (KbrO3, UV-A and UV-C) as well as COSMIC SBS18 and SBS36 mutational signatures associated with oxidative DNA damage (black dots). g Mutation profiles of the POLH-wt and POLH-KO clones for nontreated cells (NT), treated with KbrO3, UV-A and UV-C. Data are presented as mean values +/− SEM for UV-A and UV-C experiments. Sample size is indicated on the plots (independent cell clones).
The direction of TRB for these types of mutations indicates a decrease in mutations from lesions involving purines on the transcribed strand (Fig. 5a). Furthermore, comparison of C:G > A:T mutation frequencies on the transcribed and untranscribed strands with the proximal 5’ intergenic regions confirmed that TRB is indeed associated with a decrease of C:G > A:T mutations on the transcribed strand (Fig. 5b). This suggests that mutations occur due to lesions involving purines, which are NER substrates and are effectively repaired by TC-NER on the transcribed strand (Fig. 5b). Interestingly, C:G > A:T mutations had stronger TRB than YC > YT or CY > TY UV-induced mutations in all bins of genes grouped by the expression level (Fig. 5c). This observation might indicate that those lesions produce a smaller helix distortion and are less visible to GG-NER than UV-induced pyrimidine lesions.
C:G > A:T and T:A > A:T mutations occurred in a very specific dinucleotide context, where a purine is always preceded by a thymine base (TA/G > TT), suggesting that causative DNA lesions might be thymine-purine dimers (Fig. 5d). The number of mutations in a TG context was strongly correlated with the number of mutations in a TA context (r = 0.98, Pearson’s r correlation coefficient; Supplementary Fig. 10) in our XP-V skin cancer cohort suggesting coordinated mutation processes.
We hypothesized that if TA/G > TT mutations were not directly or indirectly caused by UV-irradiation their abundance would not be correlated with the typical UV-induced (YC > YT or CY > TY) mutations. UV-induced mutations usually accumulate nonlinearly but depend on the UV exposures, while mutations caused by cell physiological processes (such as purine oxidation, cytosine deamination) accumulate more or less linearly with time27. We measured a Pearson’s r correlation of TG > TT or TA > TT mutations with typical UV-induced (YC > YT or CY > TY) mutations and observed strong correlations in both cases, r = 0.78 (p = 0.001) and r = 0.99 (p = 1e−10), respectively (Supplementary Fig. 10).
To further understand the nature of TG > TT and TA > TT mutations we established a POLH knockout (KO) of the RPE-1 TP53-KO cell line and sequenced whole genomes of the POLH wt and POLH-KO clones both without treatment and with treatment with KbrO3 (to induce reactive oxygen species; n = 1), UV-A (n = 3) and UV-C (n = 3) (Supplementary Table 2, Supplementary Fig. 11). There were no major differences in the number of mutations and mutational profiles between POLH-wt and POLH-KO for untreated cells and KbrO3-treated (Fig. 5e–g). UV-A and UV-C exposures greatly increased number of SBS in the POLH-KO cells (3.9 and 10.5-folds respectively, P = 0.00078 and P = 0.01507, respectively; Welch two sample t-test) and dramatically changed the mutational profiles in comparison with POLH-wt clones (Fig. 5e–g). UV-A-treated POLH-KO clones had on average 16% of TG > TT mutations and 12% of the TA > TT mutations with specific to XP-V context and strong transcriptional bias while in the UV-C-treated clone these percentages were 10% and 4% on average, respectively (Fig. 5f, g). UV-treated POLH-KO cells demonstrated a distinct pattern of TG > VT DBS substitutions (V – A, C or G). Interestingly, a similar DBS pattern was also visible in XP-V tumors (Supplementary Fig. 12).
Another feature of the XP-V skin cancer profile was the presence of 15% (range 11–23%) of mutations originating from TT pyrimidine dimers. Such mutations are very rare in sporadic cancer (4.8%) because TT pyrimidine dimers are bypassed by polymerase η in a relatively error-free manner. Two predominant types of mutations at TT were TT > TA and TT > TC, and they, as expected for mutations from pyrimidine lesions, demonstrated strong TRB and were correlated with the typical UV-induced YC > YT or CY > TY mutations (Fig. 5a, Supplementary Fig. 10). The reconstruction of RPE-1 mutational profiles with the COSMIC mutational signatures revealed significantly higher reconstruction error for POLH-KO cells, then for POLH-wt cells and overall poor reconstruction performance for UV-A-treated cells (Supplementary Fig. 13) indicating poor representation of inferred mutational process in the public mutational catalogs.
In the absence of polymerase η, error-prone bypass of 3’ nucleotides in pyrimidine dimers shapes the mutation profile of XP-V tumors
The 3-nt context of C > T substitutions in XP-V skin cancers differed from sporadic skin cancers and other XP groups (Fig. 1b, d). Previously it was shown that in the absence of polymerase η, the bypass of CPD photoproducts can be performed in two steps by two TLS polymerases, one of which inserts a first nucleotide opposite to a 3’ nucleotide of the lesion (“inserter”), and then is replaced by another TLS polymerase, which performs the extension opposite to the 5’ nucleotide of the lesion (“extender”)28. We hypothesized that loss of polymerase η in skin cancer might change the probabilities of mutations at 3’ versus 5’ nucleotides in pyrimidine dimers and thereafter contribute to the observed differences of the mutation profiles for C > T SBS in XP-V versus sporadic skin cancer.
To test this hypothesis, we first estimated the relative number of mutations arising at 3’ and 5’ cytosines in the tetranucleotide ACCA, where we could unambiguously allocate a pyrimidine dimer (Fig. 6a). In sporadic skin cancers, the probabilities of mutations at 3’ and 5’ cytosines were similar, with only a slight increase of mutagenesis from the 3’C (55%), while in XP-V skin cancers 97% of the mutations were from the 3’C (Fig. 6b). This bias towards 3’ pyrimidine mutations was also much stronger in XP-V versus other groups of skin cancer for the CT, TC, and TT pyrimidine dimers. For example, ATCA > ATTA mutations were 9.17-fold more frequent than ACTA > ATTA mutations in XP-V than in the other groups (normalized to the corresponding 4-nt frequencies in the human genome). A similar effect was observed for T > A and T > C mutations in ATTA context (Fig. 6c).
a Schematic representation of the putative CC photodimer in ACCA context and resulting mutations analyzed in the panel b. b Fraction of C > T mutations from 5’ and 3’ cytosines of the dimer in the 5’ACCA3’ context per group of tumors. Data are presented as mean values +/− SEM. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL and n = 14 for XP-V (tumors). Source data are provided as a Source Data file. c “Dimer translesion bias” for different sequence contexts per group of tumors. Comparison of C > T mutation frequency in CT and TC pyrimidine dimers was performed after normalization to the number of such contexts in the genome (upper right panel). Boxes depict the interquartile range (25–75% percentile), lines – the median, whiskers − 1.5× the IQR below the first quartile and above the third quartile. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V (tumors). Source data are provided as a Source Data file. d Fraction of C > T mutations from 5’ and 3’ cytosines of the dimer in the 5’ACCA3’ context in the RPE-1 POLH-wt and POLH-KO clones. SEM intervals are indicated. n = 1 for NT and KbrO3 and n = 3 for UV-A and UV-C clones per cell line. Source data are provided as a Source Data file.
These results demonstrate that mutations at pyrimidine dimers in XP-V occur predominantly at the 3’ nucleotide, which might be associated with the error-prone activity of the inserter polymerase which replaces polymerase η, and modulate the mutational profile of C > T substitutions. POLH-KO cells treated with UV-C conversely demonstrated a very strong bias in CC pyrimidine dimers towards mutations at 3’C (99%) (Fig. 6d).
Mutation properties of XP groups modulate protein-damaging effects of mutagenesis
High mutation rates in cells increase cancer risk and intensify tumor evolution, while the topography of mutagenesis and mutation signatures can impact the probability of damaging or driver mutations29,30,31. In our dataset of skin cancers, the number of oncogenic mutations in the cancer genome was strongly correlated with the total mutation burden (Fig. 7a).
a Correlations between tumor mutation burden and number of oncogenic and likely oncogenic mutations in the studied skin cancer samples according to the OncoKB database. Pearson’s r coefficients and P values are indicated. b Mean fraction of exonic mutations from all the mutations per sample. Data are presented as mean values +/− SEM. n = 31 for SCC, n = 8 for BCC, n = 113 for MEL, n = 10 for XP-E, n = 8 for XP-C, n = 3 for XP-A, n = 3 for XP-D and n = 14 for XP-V (tumors). Source data are provided as a Source Data file. c Protein-damaging/silent mutation ratio per substitution type in our pooled skin cancer cohort (n = 190 tumors). Damaging mutations—all non-silent exonic (missense, truncating) and splice site mutations. Boxes depict the interquartile range (25–75% percentile), lines—the median, whiskers—1.5× the IQR below the first quartile and above the third quartile. d Mean fraction of protein-damaging mutations originating from the main mutation classes split by gene strand per group.
Active DNA repair in open chromatin regions decreases the accumulation of mutations in the early replicating gene-rich regions of cancer genomes (Fig. 2a). We estimated a fraction of mutations per genome falling in the exonic regions across the studied skin cancer groups and found in XP-A and XP-D tumors a significant enrichment of exonic mutations in comparison with the other groups (Fig. 7b). The effect was caused by the redistribution of mutations from late to early RT regions of a genome (Fig. 2a).
C > T transitions, which are the most prevalent UV mutations, have relatively low protein-damaging effect in the human genome and their damaging/silent mutation ratio is 1.8, while other types of mutations, such as C:G > A:T transversions or CC > TT DBS are more damaging with a damaging/silent mutation ratio of 3.4 and 29.5, respectively (Fig. 7c). Enrichment of highly protein-damaging CC > TT DBSs was particularly pronounced in XP-C and XP-D tumors (Fig. 7c). To better understand how the NER deficiency modulates the protein-damaging effect of UV irradiation we grouped protein-damaging mutations into 5 categories: C > T mutations on the transcribed and untranscribed strand, CC > TT double base substitutions on the transcribed and untranscribed strands, and other SBSs (Fig. 7d, Supplementary Fig. 14). The largest fraction of protein-damaging mutations was accounted for by C > T substitutions in all cancer groups except XP-V where other mutation classes play a more important role. Mutagenesis in splice-sites was preferentially caused by C > T mutations originating from the lesions on the transcribed strands of genes (Supplementary Fig. 14) At the same time different mutation types did not affect the relative abundance of conservative and non-conservative missense mutations.
Contribution of damaging C > T mutations from transcribed and untranscribed strands of genes (measured as untranscribed/transcribed ratio) differed between groups. It was balanced between strands in sporadic skin cancers (1.02-fold); at the same time the majority of damaging mutations in GG-NER deficient XP-E and XP-C groups were attributed to the untranscribed strand (1.36 and 1.82-fold, respectively), while in GG- and TC- NER deficient XP-D and XP-A groups – to the transcribed strand (0.77-fold and 0.65-fold, respectively, Fig. 7d). These results can be explained by the fact that UV-induced C > T SBS, which originate from the lesions on the transcribed strand, are 1.88-fold more protein-damaging as compared to the untranscribed strand of genes; thereafter, active lesion removal by TC-NER from the transcribed strand of genes results not only in reduction of a total number of mutations from UV lesions, but is particularly important for the reduction of the burden of protein-damaging mutations.