Mouse model of dietary induced obesity
To study the long-term effect of obesity on genomic stability in the intestinal crypt, we set up a cohort of age matched male C57/BL6J mice (Fig. 1A). After random assignment to cages with either standard chow (SD) or HFD, the mice were started on the respective diet course at the age of 5 weeks for 48 continuous weeks. At set time intervals of 6, 12, 28, and 48 weeks, a random subsample of HFD and SD mice was drawn and sacrificed to harvest ISCs for culturing. Organoids were picked and cultured to clonality before obtaining whole genome sequences (30×) for 5 obese and 5 lean mice from the last timepoint (48 weeks). For each mouse, 4 independent organoid clones and the matched tail were sequenced to distinguish acquired variants from germline variants. The clonal organoid lines take on a cystic morphology characteristic of intestinal organoids with high stem cell content13 and are positive for Lgr5 expression (Supplementary Fig. 1A, B).
(A) Schematic display of experimental workflow. (B) Macronutrients of experimental diets shown by percent contribution to total calories. HFD is shown in light blue and SD is shown in light brown. (C) Food consumption per diet group, measured per cage and divided by the number of mice per cage. The group average and statistical significance is indicated above (unpaired t-test, two-tailed) (D) Plasma cholesterol content in mg/dl shown per diet group at each point of the time course. N = 3 for each group at timepoints week 6, 12, 28 and N = 5 at week 48 (pairwise t-test, two-tailed). (E) Weekly weight measurements for diet groups. Dots indicate measurements for individual mice. Statistically significant weight gain was observed after 3 weeks on the HFD (indicated by red line) Statistical significance was tested using multiple unpaired t-tests with alpha = 0.001 (Holm–Sidak correction method for multiple testing, not assuming consistent standard deviation between groups).
Our model of diet induced obesity relies on the choice of supplied diet. In the high fat diet condition, mice derive 60% of all calories from fat, while the majority of calories in the normal diet (SD) derive from carbohydrates (55.5%) (Fig. 1B). The exact diet composition is described in Supplementary Tables 1 and 2. Despite lower overall food consumption in the HFD group (Fig. 1C), mean weekly caloric intake was higher in the HFD group (92.7 kcal/week) compared to the SD group (80.2 kcal/week). C57BL/6J mice have been well characterized as model organisms for diet induced obesity, capturing essential aspects of metabolic dysregulation and weight gain27, 28. Our cohort also exhibited the marked increase in total cholesterol upon exposure to HFD (Fig. 1D) and a significant increase in body weight after 3 weeks on the HFD (Fig. 1E), recapitulating the metabolic dysregulations and resulting phenotype associated with obesity.
Qualitative analysis of mutational profiles in SD and HFD fed mice
Since the genome records past and ongoing mutational processes, we reasoned that longer exposure to HFD would result in a stronger signal. Hence, we focused our sequencing efforts on the last time point (48 weeks). The obtained raw reads were processed according to GATK (The Genome Analysis Toolkit) best practices (Fig. 2A). To obtain a high confidence set of mutations, we utilized two mutation callers, Mutect229, 30 and Strelka231. Mutations which were found by both Mutect2 and Strelka2 and passed the respective quality filter settings were included for further analysis. This yielded a total of 48,742 single nucleotide variants (SNV), 165 double nucleotide substitutions (DSB), and 6662 indels (insertions and deletions). Due to low numbers of mutations, DSBs were excluded from further analysis. As an additional quality control step, we checked the variant allele frequency (VAF) distribution of all organoid clones and included only clonal samples, where the VAF distribution is centered around 0.5 (Supplementary Fig. 1C).
(A) Raw reads from paired end 150 bp Illumina sequencing were processed according to GATK best practices, including marking and removal of duplicates and recalibration of base quality scores. Analysis ready reads were processed by two mutation callers, Mutect2 and Strelka2. Variants called by both tools were included in the analysis. In total, 48,742 single nucleotide variants, 165 double base substitution variants, and 6662 insertions and deletions could be detected. (B) Relative contribution of SNVs in six mutation classes for HFD samples (left panel) and SD samples (right panel). C > T mutations within CpG sites are shown as a separate category. Individual dots indicate organoid samples, error bars show ± 1 sd from the mean, asterisks indicate results from pairwise t-test (two-sided) comparing mutation numbers for each mutation category, alpha = 0.05 (C) Average mutational profile of SNVs in 96 channels shown for HFD (upper panel) and SD (lower panel). Error bars indicate ± 1 sd.
We first explored the overall mutational landscape for SNVs per diet group. Surprisingly, we found a slightly higher number of total mutations in the SD group than in the HFD group. We observed a significantly higher number of mutations in the SD group for C > G, C > T outside of CpG regions, T > C, and T > G (Fig. 2B). The profile of relative contributions, across the 7 mutation channels, however, is similar between the two diet groups. Next, we examined the mutational profiles in 96 channels. The mean mutational profile per diet group exhibits few characteristic peaks, with the exception in the C > A and C > T components. The aggregated profile of the HFD group has a cosine similarity of 0.9929 to the SD group (Fig. 2C). We furthermore observe highly similar profiles between mice of either diet group (Supplementary Fig. 2A,B). To quantify how similar the mutational profiles of samples across diet groups are, we computed the pairwise cosine similarity between all samples, which ranges from 0.9020 to 0.9776 (mean = 0.9558) (Supplementary Fig. 2C).
The high cosine similarity between all samples implies the absence of strong differential mutational processes. To test this, we used a bootstrap resampling method of the 96-channel mutation matrix, adapted from Zou et al., for SD and HFD samples24. This allows us to detect potential qualitative differences in mutational profiles which remain uncovered due to low sample size and high signal to noise ratio. The global bootstrapped mutational profile of the SD mice has a cosine similarity of 0.9933 when compared to the profile of the HFD mice (Supplementary Fig. 2D).
In summary, this suggests that no strong qualitative differences exist for mutagenic processes in either diet group.
Mutational signature analysis of single nucleotide variant profiles
De-novo signature extraction of SNV signatures
Despite the lack of qualitative differences in the mutational profiles of the two diet groups, we next sought to explore which mutational signatures are active in the diet groups to determine whether quantitative differences exist. We first employed non-negative matrix factorization (NMF) with automated rank selection based on the NMFk method to determine the optimal number of de-novo signatures to extract32, 33. Classically, NMF algorithms use heuristics to determine the optimal rank based on either stability of the solution, or on automatic relevance determination (ARD), which is a measure of precision of the chosen model to explain the data33. In contrast, the NMFk method for automatic rank determination seeks to optimize the tradeoff between both, the stability of the solution and the accuracy of the reconstructed data, measured as the distance between original and reconstructed profiles (mean sample cosine distance). This method allows to robustly extract a meaningful number of signatures from noisy data while minimizing the number of false positive signatures32.
Applied to our data, both stability and mean sample cosine distance decline the more signatures are extracted. Thus, the most optimal solution consists of extracting a single signature (Fig. 3A). The presence of a single consensus profile in the cohort would indicate that there are no distinguishing signatures between diet groups. The de-novo extracted signature can furthermore be decomposed into known signatures from the catalog of somatic mutations in cancer (COSMIC) database34. According to this decomposition, the de-novo signature consists of 48.1% SBS5, 42.56% SBS18, and 9.34% SBS1 (Fig. 3A). Comparing the per sample contribution of the decomposed signatures reveals an equal distribution of signature activities across samples, regardless of diet used (Fig. 3B).
(A) NMF for signature extraction ranging from 1 to 4 signatures. Red line indicates mean sample cosine distance (MSCD), blue line indicates average stability (AS), gray bar indicates preferred solution, maximizing the tradeoff between MSCD and AS. The decomposition of the extracted signature into known COSMIC signatures and their calculated percent contribution is shown to the right. (B) NMF results from A shown as per sample absolute signature contributions (number of mutations), diet status of the samples is indicated at the bottom. (C) Best subset signature refitting using signatures commonly active in colorectal cancer. Per sample absolute signature contributions (number of mutations) are shown for HFD samples (upper panel) and SD samples (lower panel). (D) Signature Presence test for 4 most active signatures. The y-axis indicates the likelihood ratio between the signature fitting with and without the tested signature. The translucence of the bars, shown for individual organoid clones, is indicative of the level of significance (-log p).
Signature refitting of SNV signatures
In cohorts with lower sample numbers such as ours, an alternative approach to de-novo extraction is signature refitting, where the mutational catalog of the samples is fitted to the catalog of known signatures (COSMIC) to find a subset which best explains the observed mutational catalog. This approach takes a defined set of known signatures and performs a refit in an iterative manner. After each iteration the reconstructed and original profile are compared and the lowest-contributing signature is eliminated from the set. Signatures will stop being removed when the cosine similarity between the reconstructed and original profile between two iterations has changed more than a given threshold. Thus, only signatures which are necessary to model the observed data are retained in the set. By repeating this process n − 1 times for all sets of n − 1, n − 2, n − 3 etc., where n is the total number of known signatures, we find that SBS1, SBS5, SBS18, and SBS40 explain 99.6% of observed mutations in both diet groups (Fig. 3C). Only four samples showed minimal activity of SBS15 (defective mismatch repair), a signature directly attributable to an increase in genomic instability. The equally minimal number of mutations attributed to SBS15 in both diet groups, however, suggest no differential potential in mismatch repair among SD and HFD. In summary, the distribution of the fitted signatures is highly similar across samples and is not diet specific.
To quantify the activity of the most active signatures, we applied the signature presence test from the mSigAct package35, 36 to SBS1, SBS5, SBS18, and SBS40. This statistical test builds two refit models, one including and one excluding the signature of interest, while minimizing the reconstruction error. Following this, the likelihood ratio test between the two models is computed. For ratios greater than 1, the likelihood of the signature being active is significantly higher than the alternative hypothesis. The signature presence test confirmed the results obtained from signature refitting. Of the 4 tested signatures, SBS5 is the least active, as already observed before (Fig. 3D). Although some variation in signature activity can be observed between individual samples, no signature shows a systematic difference between diets.
All signatures we found are equally active in both diet groups and are likely attributable to normal aging processes and the culturing process prior to sequencing. SBS1 is a clock-like signature which is attributed to aging due to spontaneous deamination of 5-methylcytosines, which leads to a C > T transition21. The activity of SBS1 observed in both groups thus likely reflects the normal aging process. Additionally, both groups showed high numbers of C > A mutations, which were largely attributed to SBS18. This signature has been proposed to be caused by damage due to reactive oxygen species22, 25 and might thus have arisen during the routine experimental handling of the samples or due to exposure to metabolic byproducts in the intestine. The remaining signatures SBS5 and SBS40 share similarly flat profiles. Although only SBS5 has been clearly identified as a clock-like signature, SBS40 was also found to correlate with age22, 37. Thus, the activity of both signatures may be explained by normal aging processes. Taken together, the results from de-novo extraction and signature refitting, confirm that the experimental HFD did not induce or impact different mutational processes for single nucleotide substitutions compared to the standard diet.
Mutational signature analysis of indel profiles
Comparison of indel profiles between diet groups
Aside from SNVs, numerous mutational processes also generate insertions and deletions. This class of mutations generates signatures different to SNV signatures. We therefore analyzed the 6662 indel mutations in our cohort to compare whether differences in indel generating mutational processes exist between the diet groups. We only considered clonal samples with a VAF distribution centered around 0.5 (Supplementary Fig. 3). Indel mutations can be analyzed in 16 or in 83 curated channels, representing the main and extended sequence context respectively38. The curated indel types range from a single base pair deletion or insertion, up to indels longer than 5 bp. Additionally, 1–5 bp deletions flanked by microhomologies are considered, since such mutations are indicative of defective double strand break repair processes39. Indel profiles in both sequence contexts were highly similar between diet groups, with a cosine similarity of 0.9925 for main indel contexts (Fig. 4A), and 0.9941 for extended indel contexts (Fig. 4B). Only 5 + bp deletions flanked by microhomologies were significantly increased in the SD compared to the HFD cohort (Fig. 4A). However, since the total number of mutations in that category is less than 10, this likely represents a random variation and carries no specific biological meaning. Indeed, all mice, regardless of diet group, exhibited highly similar indel profiles, both for the main and the extended sequence context (Supplementary Fig. 4A–D). Furthermore, all samples showed a pairwise cosine similarity greater than 0.84 (Supplementary Fig. 4E). Conclusively, the high cosine similarity between indel profiles of the diet groups as well as among individual samples suggest that no indel generating processes are unique to either diet.
(A) Main indel context (16-channels) profiles aggregated by diet group (mean), error bars indicate ± 1 sd. Statistical significance was assessed using multiple pairwise t-tests, not assuming consistent standard deviation (Holm–Sidak correction method for multiple testing, alpha = 0.01) (B) Mean extended context indel profile by diet group (83 channels), error bars indicate ± 1 sd from the mean. (C) NMF diagnostic plot for signature extraction ranging from 1 to 4 signatures. Red line indicates mean sample cosine distance (MSCD), blue line indicates average stability (AS), gray bar indicates preferred solution, maximizing the tradeoff between MSCD and AS. The decomposition of the extracted signature into known COSMIC indel signatures and their calculated percent contribution is shown to the right. (D) NMF results from A shown as per sample absolute signature contributions (number of mutations), the diet status of the samples is indicated on the x-axis. (E) Best subset signature refitting using all 18 known indel signatures. Per sample absolute signature contributions (number of mutations) are shown. Diet status is indicated on the x-axis. (F) Bootstrapped refitting of indel signatures (best subset approach using all known indel signatures, 100 iterations). Size of dots indicates the mean contribution of the signature for all bootstrap iterations where this signature was found. The color scale represents the percentage of bootstrap iterations where the signature was found active. (G) Signature Presence test for 4 most active signatures found in refitting and bootstrapped refitting. The x-axis indicates the sample, the y-axis indicates the likelihood ratio between the signature fitting with and without the tested signature. The translucence of the bars, shown for individual organoid clones, is indicative of the level of significance (− log p).
De-novo signature extraction of indel signatures
We next applied the same analysis workflow we established for SNV signatures to all insertions and deletions. NMF with automated rank selection, found one indel signature as the optimal solution because extraction of more than one signature led to a sharp decrease in average stability (Fig. 4C left panel). The decomposition of the single de-novo signature estimated three known COSMIC signatures to be active, ID1 (22.46%), ID2 (40.88%), and ID12 (36.66%) (Fig. 4C right panel). The distribution of the signature contribution to the individual samples does not differ between diet groups (Fig. 4D).
Signature refitting for indel signatures
Exploring indel signatures further with refitting analysis allowed us to confirm the results obtained from de-novo extraction. Using best subset refitting with all 18 known indel signatures, we find ID1, ID2, and ID12 most active and similarly distributed across samples (Fig. 4E). Minor activity observed for ID7 (MMR deficiency37) and ID9 (etiology unknown37), may be due to signature misattribution for the common C and T deletions found in our cohort. Since the low number of mutations may be limiting in this analysis, we also pooled the mutational matrix of each diet group and performed a best subset refit to all COSMIC indel signatures. The results show an equal distribution of ID1, ID2, and ID12 activity across diet groups (Supplementary Fig. 4F). To confirm the stability of the refitting, we performed bootstrapped refitting. The mutational matrix is resampled 1000 times with replacement, using the original mutational profile as weight. For each bootstrap iteration a refit is calculated, recording the estimated signature activity. The higher the consensus of refits across bootstrap iterations, the more stable the refit. The results confirm that ID1, ID2, and ID12 are the most active signatures in our cohort, regardless of diet consumed. ID7 was found active only in the SD group and was attributed less than 10% of all mutations in that group (Fig. 4F). Finally, we quantified the signature activity of ID1, ID2, ID7, and ID12 for all samples, using the signature presence test (Fig. 4G). The results confirm that ID2, and ID12 (etiology unknown37) are the most active signatures in both diet groups, since the majority of mutations is attributed to these signatures across all samples. ID1 is the third most active indel signature, followed by ID7, which is active only in some samples and completely absent in 23% of all samples.
None of the identified indel signatures are differentially active between tested diets. Signatures ID1 and ID2 are both proposed to arise due to slippage of the replicated (ID1), and template strand (ID2) during replication, producing the characteristic 5 + bp T-insertions and 6 + bp T-deletions. These signatures have been observed to be active in all samples and are only increased in backgrounds with mismatch repair deficiency (MMR)37. In our cohort, we have not observed a strong activity of either SNV or indel signatures associated with defective mismatch repair. The low activity of MMR deficiency signature ID7 in some samples may partially explain the high activity observed for ID1 and ID2. However, this process is equally active in both diet groups (Fig. 4E,G). Notably, ID1 and ID2 activity were found increased in conditions of chronic inflammation of the intestinal tract in patients40. Even though obesity is associated with changes in metabolic and hormonal signaling associated with forming an inflammatory environment6, we do not observe an increase in ID1 and ID2 activity that would indicate strong changes in inflammatory signaling. Thus, the activity of ID1 and ID2 we find in both diet groups suggest mutational processes ongoing during normal cellular replication. In summary, the results indicate that the experimental HFD did not invoke or influence mutational processes of indel generation.
Coding mutations
Finally, we wondered whether the absence of specific mutational processes also precluded the accumulation of specific deleterious mutations which might initiate the adenoma-carcinoma sequence and thus predispose to tumor development. To test this, we explored the coding mutations which accumulated in either diet group. Due to low numbers of coding mutations in our cohort, we included all mutations which passed the filtering criteria from the Mutect2 variant caller. Filtering for the most mutated genes revealed a remarkable overlap, 9 out of the top 10 most mutated genes are shared between the SD and HFD group (Fig. 5A,B). The largest fraction of alterations are missense mutations. None of the mutated genes have a known role in intestinal cancer development. Taken together, we found no specific mutations which might explain how obesity increases risk of cancer development in the intestinal tract.