Samples and clinical characteristics
We collected and sequenced genomes of 33 skin cancers from 21 patients representing 5 out of 8 xeroderma pigmentosum (XP) groups (3 XP-A, 4 XP-C, 2 XP-D, 10 XP-E, and 14 XP-V tumors; Supplementary Table 1). Causative homozygous (n = 12) or compound heterozygous (n = 8) germline variants were identified in 20 patients, 13 of which had known causative germline mutations (Supplementary Table 1), while the 7 others—had novel germline mutations compatible with the diagnosis. The mean tumor purity and sequence coverage were 41% and 40× (30× for normal tissue), respectively. In addition, we sequenced genomes of 6 sporadic cutaneous squamous cell carcinoma samples (SCC). This newly generated data was combined with WGS data from four previously published XP-C tumors12,13, one XP-D14, as well as 25 sporadic cutaneous Squamous Cell Carcinomas (SCC)12,15,16, 8 Basal Cell Carcinomas15 (BCC) and 113 Melanomas17 (MEL) from individuals not affected with XP. The resulting cohort of XP tumors included 17 BCCs, 15 SCCs, five melanomas, and one angiosarcoma. The mean age at biopsy in XP-cohort was 33 years old (ranging from 25 years old in the XP-C group to 48 years old in the XP-V group) while in sporadic skin cancer group it was 65 years old (Table 1, Supplementary Table 1).
XP groups demonstrate different mutation burden and mutation profiles
We assessed the Tumor Mutation Burden (TMB) and mutation profiles of skin cancer genomes from 5 sequenced XP groups and compared them with the three types of sporadic skin cancer including BCC, SCC and MEL (Fig. 1a). The mean TMB of single base substitutions (SBS) was significantly higher in 3 XP groups: XP-E (350 mut/Mb, p = 0.0241), XP-V (248 mut/Mb, P = 0.0014) and XP-C skin cancers (162 mut/Mb, P = 0.0220), than the dataset weighted average (130 mut/Mb, global P < 2.2e−16; Kruskal–Wallis H test; Fig. 1a). We also observed a strong difference in the TMB and the proportion of CC > TT double base substitutions (DBS) characteristic of UV-induced mutagenesis between the different XP groups and sporadic cancers (Fig. 1a). The highest proportion of CC > TT DBS from UV-induced SBS in pyrimidine dimers (C > T in YpC or CpY contexts; Y denotes a pyrimidine) was observed in XP-C and XP-D tumors (0.2 and 0.17, respectively), which was 6 times higher than in sporadic skin cancers (0.03, p = 4.7e−08, Mann–Whitney U test, two-sided).
The mutation profiles of skin cancers in all XP groups were dominated by C > T substitutions at pyrimidine dimers, as also found in sporadic skin cancers. However, some XP groups demonstrated marked differences for C > T mutations in specific contexts, such as enrichment at TCA in XP-E, TCW in XP-C, or NCY in XP-D (where W denotes A or T; N: A, C, G, or T; Y: C or T). Moreover, in XP-V skin cancers, we report abundant mutations, namely C:G > A:T, T:A > A:T, and T:A > C:G, which were not previously seen to a significant degree in skin cancer (Fig. 1b, Supplementary Fig. 1). XP tumors formed clusters by XP group, which were non-overlapping with the cluster of sporadic skin cancers based on SBS mutation profiles and multidimensional scaling analysis (MDS; Fig. 1c–e; Supplementary Figs. 2, 3). XP-V, XP-C, and XP-A clusters were located distantly, while the XP-E / XP-D cluster was closer to the cluster of sporadic skin cancers.
Among 78 COSMIC mutation signatures18 (v3.2) extracted from the pan-cancer dataset, four mutation signatures (SBS7a/b/c/d) are associated with UV irradiation, and combination of SBS7a and SBS7b usually explain the majority of mutations in sporadic melanomas18. We investigated whether these signatures could explain the observed mutation profiles in XP skin cancer with an accuracy comparable to sporadic skin cancers. For that, we compared observed and reconstructed mutation profiles for each sample in our cohort. The mean Cosine dissimilarity distance was small for sporadic skin cancers (0.004) but increased drastically for all the XP groups (0.16) and particularly for XP-C (0.237), XP-A (0.1957), and XP-V (0.222, Fig. 1f, Supplementary Fig. 4) indicating that mutational profiles in XP skin cancer cannot be optimally reconstructed with the known UV mutational signatures.
Nucleotide excision repair efficiency determines mutation load distribution along the genome
Strong heterogeneity in the mutation rate across the genome is an important fundamental feature of mutagenesis, which has several clinical implications, for example, the discovery of cancer driver genes. We investigated the distribution of typical UV mutations (YC > YT or CY > TY) in XP and sporadic skin cancers in relation to replication timing (RT), active and inactive topologically associated domains (TAD), and markers of chromatin states. These analyses revealed a major role for NER in shaping the heterogeneity of local rates of UV-induced mutations across the genome. A maximal 5.2-fold difference was observed between the earliest and the latest replicating bins in sporadic skin cancers (average for BCC, cSCC, and MEL) with a monotonal decrease of mutation load from late to early replicating genomic regions (Fig. 2a). This effect was much weaker in GG-NER deficient XP-C genomes (2.4-fold) and almost disappeared in GG-NER and TC-NER deficient XP-A (1.5-fold) and XP-D (0.99-fold) genomes (Fig. 2a). Interestingly, the distribution of UV-induced SBS by RT in XP-E and XP-V genomes was not very different from sporadic skin cancer genomes, 4.6-fold and 5.4-fold, respectively.
It has been recently shown that TAD boundaries between active and inactive chromatin domains strongly delineate the transition between regions with low and high mutation load in different human cancers19. Indeed, in our cohort, we found a 2.2-fold difference in mutation load between active and inactive TADs in sporadic cancers, but it was noticeably decreased in XP-C (1.4-fold) cancers and was virtually absent in XP-A (1.05-fold) and XP-D (1.09-fold; Fig. 2b). Similarly, the mutation load in XP-A and XP-D tumors was independent of chromatin states, the XP-C group demonstrated a mild dependence, while the XP-E and the XP-V groups were not different from sporadic cancers (Fig. 2c).
CPD and 6-4PP DNA lesions occur on pyrimidine bases, which enabled us to identify the strand on which the lesion underlying a UV-induced mutation occurred. In order to separately investigate the genomic targets of GG-NER and TC-NER, we split the genome into intergenic, transcribed, and untranscribed strands of genic regions. A strong decrease of mutation rate in the early RT regions in groups proficient in GG-NER (sporadic cancers and XP-V), and surprisingly in GG-NER deficient XP-E, was observed in intergenic regions and untranscribed strands of genes. Whereas XP-A and XP-D which lack both GG-NER and TC-NER, had flat slopes compatible with the lack of repair in the open chromatin of early RT regions (Fig. 2d, Supplementary Fig. 5). XP-C samples with the functional TC-NER and fully abrogated GG-NER demonstrated lack of repair on untranscribed gene strands and in the intergenic regions, but they were proficient in repair of the transcribed strands of genes (Fig. 2d, Supplementary Fig. 5).
Transcriptional bias is different between the XP groups
TC-NER removes UV-induced bulky DNA lesions on the transcribed strand of expressed genes more efficiently than GG-NER on the untranscribed strand resulting in a decrease of mutations on the transcribed versus untranscribed strand, a phenomenon called transcriptional bias (TRB)20. In skin tumors with proficient NER, the TRB ranged between 1.3 and 1.6-fold for sporadic cancers and was 1.7 in XP-V. In the GG-NER-deficient TC-NER-proficient groups, TRB was particularly high, ranging between 1.77-fold (XP-E) and 2.42-fold (XP-C), which is compatible with defects in the repair of the untranscribed strand. In contrast, in XP-A and XP-D groups with defects of both TC-NER and GG-NER TRB was minimal or absent: 1.17-fold and 0.97-fold, respectively (Fig. 3a).
TC-NER removes DNA lesions downstream of genes and influences intergenic mutation load
Since early RT regions are particularly gene-rich21, we hypothesized that in GG-NER deficient, but TC-NER proficient XP groups (XP-C, XP-E), decreased mutation load in early RT intergenic regions might be associated with the TC-NER activity beyond gene boundaries. Indeed, in GG-NER deficient XP-C tumors, we revealed a statistically significant TRB up to 40 kb downstream of the furthest annotated transcriptional end sites (TES) of genes with decreased mutation frequency on the transcribed strand of nearby genes (Fig. 3b, Supplementary Fig. 6). The same effect was observed in XP-E and even in NER proficient skin cancers although with a lower magnitude (but significant for sporadic melanoma with the large sample size, Supplementary Fig. 6). As expected, we did not observe TRB downstream of genes in XP-A and XP-D samples being deficient for both TC-NER and GG-NER (Fig. 3b, Supplementary Fig. 6). To validate TC-NER activity downstream of gene TES, we used previously published XR-seq data from XPC-deficient cell lines22,23. It is expected that in XPC-deficient cells, XR-seq data, representing the sequencing of lesion-containing DNA fragments excised by NER22, is produced exclusively by TC-NER. An XR-seq signal was observed up to 40 kb downstream of TES on a transcribed strand of a nearby gene, mirroring mutation asymmetry in the same regions in XP-C tumors (Fig. 3c, Supplementary Fig. 7) and was well correlated with the transcriptional intensity of nascent RNA, which was retrieved from an independent study24 (Fig. 3d). This suggests that, in some cases, the RNA polymerase might continue transcription after TES and recruit TC-NER at lesion sites. We identified XR-seq signal in 21% of the cumulative length of intergenic regions and 14%—of untranscribed strands of genes in XPC-deficient cell line22, suggesting ubiquitous extended TC-NER activity. Analysis of transcriptional bias and relative mutation rate in intergenic regions of XP-C tumors (Fig. 3e, f) revealed strong dependence on the intensity of XR-seq outside the annotated genic regions. This extended TC-NER activity outside of the transcribed strand of genes is especially strong in early replicating regions with a high density of active genes (Fig. 3c). It may explain the decrease of the mutation density in intergenic regions and on the untranscribed strands of genes in early replicating genomic regions of GG-NER deficient XP-C samples (Fig. 2d, Supplementary Fig. 5).
XP-E demonstrates reduced GG-NER activity
The sensors of UV-induced DNA lesions in GG-NER, XPC, and DDB2 (XPE) are thought to work in tandem when DDB2 binds directly to a lesion and facilitates recruitment of XPC, which in turn initializes the repair process with the TFIIH complex25. We decided to compare the features of UV-induced mutagenesis in XP-E resulting from the loss of DDB2 with XP-C and sporadic tumors.
MDS plot based on SBS mutational profiles (Fig. 4a) and hierarchical clustering (Supplementary Fig. 8a) revealed three clusters corresponding to XP-C, XP-E, and sporadic tumors. At the same time, the proportion of CC > TT DBS was much increased in XP-C (0.21) versus sporadic cSCC (0.064), but significantly decreased in XP-E cSCC (0.034, p = 0.0003; Mann–Whitney U test, two-sided), confirming qualitative differences of mutagenesis in XP-E. Unlike XP-C, the distribution of the mutational load in intergenic and untranscribed strand gene regions by RT in XP-E was very close to that of sporadic cSCC, suggesting that repair in early RT regions was functional in XP-E (Fig. 2d, Supplementary Fig. 5). Similarly, the MDS plot and hierarchical clustering based on the local mutation load in 2684 1Mb-long intervals along the genome, revealed no strong difference between XP-E and sporadic samples while XP-C samples all grouped together irrespectively the tumor type (Fig. 4b, Supplementary Fig. 8b). For example, a single XP-E melanoma sample clustered within sporadic melanomas and majority of XP-E nonmelanoma skin cancers—with sporadic samples of the corresponding cancer types, while XP-C samples grouped together separately from sporadic and XP-E tumors (Fig. 4b, Supplementary Fig. 8b).
The XP-E group demonstrated a strong TRB (1.77-fold), which was intermediate between sporadic cSCC (1.33) and the XP-C group (2.47) (Fig. 3a, Fig. 4c, d). Given that TC-NER is functional in XP-E, XP-C, and sporadic samples, and assuming that GG-NER is fully abrogated in XP-C, we can estimate the relative efficiency of GG-NER in XP-E tumors. Providing all else is equal, GG-NER is 64% less efficient in XP-E than in sporadic cancers.
To provide a more detailed view of the mutation difference between XP-E, XP-C, and sporadic tumors, we compared the association of mutation load in each group with the core epigenetic marks from primary keratinocyte cell line26 using only cSCC samples (Supplementary Fig. 9). Unlike XP-C, XP-E tumors did not show strong and significant differences from sporadic cSCC in the dependence of mutagenesis on the majority of epigenetic covariates except for the histone modification marks H3K36me3, H3K27ac and H3K9me3 on the transcribed strand of gene regions (Supplementary Fig. 9).
Taking these observations together, we can speculate that in XP-E tumors, there is a residual activity of GG-NER associated with the ability of XPC to find a fraction of DNA lesions and initiate NER. This correlates with the clinical observation that XP-E patients develop less and later skin tumors than XP-C patients.
Polymerase η deficiency causes a specific mutation profile in skin cancers
The analysis of XP-V skin cancers revealed that on average of 27% (15–42%) of SBS were represented by C:G > A:T mutations with a highly specific 3-nt context (NCA) and a strong and homogeneous TRB (Figs. 5a, 1b, Supplementary Fig. 1). Similar mutation contexts and a TRB was observed for a part of T:A > A:T mutations, which represented 8.7% of SBS. In sporadic skin cancers, C:G > A:T and T:A > A:T mutations represented only 2.5% and 4.6%, respectively, and had different broad 3-nt contexts without a strong TRB (Fig. 1b, Supplementary Fig. 1). Enrichment of these types of mutations in XP-V suggests that they might originate from lesions that are bypassed by polymerase η in an error-free manner in sporadic skin cancer, but XP-V cells have to use an alternative polymerase(s) to bypass these lesions.
The direction of TRB for these types of mutations indicates a decrease in mutations from lesions involving purines on the transcribed strand (Fig. 5a). Furthermore, comparison of C:G > A:T mutation frequencies on the transcribed and untranscribed strands with the proximal 5’ intergenic regions confirmed that TRB is indeed associated with a decrease of C:G > A:T mutations on the transcribed strand (Fig. 5b). This suggests that mutations occur due to lesions involving purines, which are NER substrates and are effectively repaired by TC-NER on the transcribed strand (Fig. 5b). Interestingly, C:G > A:T mutations had stronger TRB than YC > YT or CY > TY UV-induced mutations in all bins of genes grouped by the expression level (Fig. 5c). This observation might indicate that those lesions produce a smaller helix distortion and are less visible to GG-NER than UV-induced pyrimidine lesions.
C:G > A:T and T:A > A:T mutations occurred in a very specific dinucleotide context, where a purine is always preceded by a thymine base (TA/G > TT), suggesting that causative DNA lesions might be thymine-purine dimers (Fig. 5d). The number of mutations in a TG context was strongly correlated with the number of mutations in a TA context (r = 0.98, Pearson’s r correlation coefficient; Supplementary Fig. 10) in our XP-V skin cancer cohort suggesting coordinated mutation processes.
We hypothesized that if TA/G > TT mutations were not directly or indirectly caused by UV-irradiation their abundance would not be correlated with the typical UV-induced (YC > YT or CY > TY) mutations. UV-induced mutations usually accumulate nonlinearly but depend on the UV exposures, while mutations caused by cell physiological processes (such as purine oxidation, cytosine deamination) accumulate more or less linearly with time27. We measured a Pearson’s r correlation of TG > TT or TA > TT mutations with typical UV-induced (YC > YT or CY > TY) mutations and observed strong correlations in both cases, r = 0.78 (p = 0.001) and r = 0.99 (p = 1e−10), respectively (Supplementary Fig. 10).
To further understand the nature of TG > TT and TA > TT mutations we established a POLH knockout (KO) of the RPE-1 TP53-KO cell line and sequenced whole genomes of the POLH wt and POLH-KO clones both without treatment and with treatment with KbrO3 (to induce reactive oxygen species; n = 1), UV-A (n = 3) and UV-C (n = 3) (Supplementary Table 2, Supplementary Fig. 11). There were no major differences in the number of mutations and mutational profiles between POLH-wt and POLH-KO for untreated cells and KbrO3-treated (Fig. 5e–g). UV-A and UV-C exposures greatly increased number of SBS in the POLH-KO cells (3.9 and 10.5-folds respectively, P = 0.00078 and P = 0.01507, respectively; Welch two sample t-test) and dramatically changed the mutational profiles in comparison with POLH-wt clones (Fig. 5e–g). UV-A-treated POLH-KO clones had on average 16% of TG > TT mutations and 12% of the TA > TT mutations with specific to XP-V context and strong transcriptional bias while in the UV-C-treated clone these percentages were 10% and 4% on average, respectively (Fig. 5f, g). UV-treated POLH-KO cells demonstrated a distinct pattern of TG > VT DBS substitutions (V – A, C or G). Interestingly, a similar DBS pattern was also visible in XP-V tumors (Supplementary Fig. 12).
Another feature of the XP-V skin cancer profile was the presence of 15% (range 11–23%) of mutations originating from TT pyrimidine dimers. Such mutations are very rare in sporadic cancer (4.8%) because TT pyrimidine dimers are bypassed by polymerase η in a relatively error-free manner. Two predominant types of mutations at TT were TT > TA and TT > TC, and they, as expected for mutations from pyrimidine lesions, demonstrated strong TRB and were correlated with the typical UV-induced YC > YT or CY > TY mutations (Fig. 5a, Supplementary Fig. 10). The reconstruction of RPE-1 mutational profiles with the COSMIC mutational signatures revealed significantly higher reconstruction error for POLH-KO cells, then for POLH-wt cells and overall poor reconstruction performance for UV-A-treated cells (Supplementary Fig. 13) indicating poor representation of inferred mutational process in the public mutational catalogs.
In the absence of polymerase η, error-prone bypass of 3’ nucleotides in pyrimidine dimers shapes the mutation profile of XP-V tumors
The 3-nt context of C > T substitutions in XP-V skin cancers differed from sporadic skin cancers and other XP groups (Fig. 1b, d). Previously it was shown that in the absence of polymerase η, the bypass of CPD photoproducts can be performed in two steps by two TLS polymerases, one of which inserts a first nucleotide opposite to a 3’ nucleotide of the lesion (“inserter”), and then is replaced by another TLS polymerase, which performs the extension opposite to the 5’ nucleotide of the lesion (“extender”)28. We hypothesized that loss of polymerase η in skin cancer might change the probabilities of mutations at 3’ versus 5’ nucleotides in pyrimidine dimers and thereafter contribute to the observed differences of the mutation profiles for C > T SBS in XP-V versus sporadic skin cancer.
To test this hypothesis, we first estimated the relative number of mutations arising at 3’ and 5’ cytosines in the tetranucleotide ACCA, where we could unambiguously allocate a pyrimidine dimer (Fig. 6a). In sporadic skin cancers, the probabilities of mutations at 3’ and 5’ cytosines were similar, with only a slight increase of mutagenesis from the 3’C (55%), while in XP-V skin cancers 97% of the mutations were from the 3’C (Fig. 6b). This bias towards 3’ pyrimidine mutations was also much stronger in XP-V versus other groups of skin cancer for the CT, TC, and TT pyrimidine dimers. For example, ATCA > ATTA mutations were 9.17-fold more frequent than ACTA > ATTA mutations in XP-V than in the other groups (normalized to the corresponding 4-nt frequencies in the human genome). A similar effect was observed for T > A and T > C mutations in ATTA context (Fig. 6c).
These results demonstrate that mutations at pyrimidine dimers in XP-V occur predominantly at the 3’ nucleotide, which might be associated with the error-prone activity of the inserter polymerase which replaces polymerase η, and modulate the mutational profile of C > T substitutions. POLH-KO cells treated with UV-C conversely demonstrated a very strong bias in CC pyrimidine dimers towards mutations at 3’C (99%) (Fig. 6d).
Mutation properties of XP groups modulate protein-damaging effects of mutagenesis
High mutation rates in cells increase cancer risk and intensify tumor evolution, while the topography of mutagenesis and mutation signatures can impact the probability of damaging or driver mutations29,30,31. In our dataset of skin cancers, the number of oncogenic mutations in the cancer genome was strongly correlated with the total mutation burden (Fig. 7a).
Active DNA repair in open chromatin regions decreases the accumulation of mutations in the early replicating gene-rich regions of cancer genomes (Fig. 2a). We estimated a fraction of mutations per genome falling in the exonic regions across the studied skin cancer groups and found in XP-A and XP-D tumors a significant enrichment of exonic mutations in comparison with the other groups (Fig. 7b). The effect was caused by the redistribution of mutations from late to early RT regions of a genome (Fig. 2a).
C > T transitions, which are the most prevalent UV mutations, have relatively low protein-damaging effect in the human genome and their damaging/silent mutation ratio is 1.8, while other types of mutations, such as C:G > A:T transversions or CC > TT DBS are more damaging with a damaging/silent mutation ratio of 3.4 and 29.5, respectively (Fig. 7c). Enrichment of highly protein-damaging CC > TT DBSs was particularly pronounced in XP-C and XP-D tumors (Fig. 7c). To better understand how the NER deficiency modulates the protein-damaging effect of UV irradiation we grouped protein-damaging mutations into 5 categories: C > T mutations on the transcribed and untranscribed strand, CC > TT double base substitutions on the transcribed and untranscribed strands, and other SBSs (Fig. 7d, Supplementary Fig. 14). The largest fraction of protein-damaging mutations was accounted for by C > T substitutions in all cancer groups except XP-V where other mutation classes play a more important role. Mutagenesis in splice-sites was preferentially caused by C > T mutations originating from the lesions on the transcribed strands of genes (Supplementary Fig. 14) At the same time different mutation types did not affect the relative abundance of conservative and non-conservative missense mutations.
Contribution of damaging C > T mutations from transcribed and untranscribed strands of genes (measured as untranscribed/transcribed ratio) differed between groups. It was balanced between strands in sporadic skin cancers (1.02-fold); at the same time the majority of damaging mutations in GG-NER deficient XP-E and XP-C groups were attributed to the untranscribed strand (1.36 and 1.82-fold, respectively), while in GG- and TC- NER deficient XP-D and XP-A groups – to the transcribed strand (0.77-fold and 0.65-fold, respectively, Fig. 7d). These results can be explained by the fact that UV-induced C > T SBS, which originate from the lesions on the transcribed strand, are 1.88-fold more protein-damaging as compared to the untranscribed strand of genes; thereafter, active lesion removal by TC-NER from the transcribed strand of genes results not only in reduction of a total number of mutations from UV lesions, but is particularly important for the reduction of the burden of protein-damaging mutations.