Our approach for identifying potential immunosuppressive CH mutations in TII consisted of four stages as shown in Fig. 2. (1) We selected protein altering mutations, which are more likely to be pathogenic than non-coding and synonymous mutations. (2) Clonally expanded somatic mutations in TII were identified based on variant allele fraction (VAF). (3) Potential pathogenic non-passenger mutations were identified based on frequency of occurrence in the genome aggregation database (genomAD). In addition, we confirmed that the gene is expressed in immune cells in the breast cancer tumor microenvironments using single-cell RNA sequencing (scRNA-seq) data. We also excluded potential passenger mutations based on several criteria, such as mutations in genes that are naturally hypermutated during immune response. (4) Finally, we selected potential immunosuppressive mutations based on pathways affected by significantly differentially expressed genes and published experimental evidence supporting their role in immune response. Overall, our selection criteria were designed to reduce the possibility of including passenger mutations even at the risk of excluding additional potential immunosuppressive CH mutations in TII cells. Based on the above criteria, we identified eight potential immunosuppressive CH mutations in TII cells (Table 1).
Protein altering variants
Matched blood and tumor samples from 1,064 breast invasive carcinoma (BRCA) patients were used for this study. BRCA cases consisted of 63.6% HR + /HER − , 21.6% HR − /HER − , 11.44% HR + /HER2 + , and 3.4% HR − /HER2 + (SI Figure S1A), which are significantly different from the 87.4%, 13.2%, 12.6%, and 5.1%, respectively in the surveillance, epidemiology, and end results (SEER) report for the US29. The median age at diagnosis of 58 for BRCA cases (SI Figure S1B), is lower than the median age of 63 in the SEER report. BRCA cases consisted of 73.0% Stage I and II cases and 24.5% Stage III and IV cases (SI Figure S1C) compared to 65.1% Localized and 33.0% Regional/Distant cases in the SEER report. These differences will need to be accounted for in any future studies considering the mutations identified here as potential risk markers.
A total of 4,579,609 different mutations were detected in BRCA exome sequencing data. The vast majority of these, such as synonymous, intergenic and intron mutations, are less likely to be pathogenic. Therefore, we excluded these mutations, limiting subsequent analysis to protein altering mutations, which are more likely to be pathogenic. In addition, for this analysis we did not consider epigenetic modifications and copy number variations since their pathogenicity is in general unclear. Protein altering mutations included nonsense, missense, insertion, deletion, and splice site variants. Of the 4,579,609 different mutations, 558,470 (12%) were in coding regions. Of the coding region variants 445,132 (80%) were protein altering mutations (SI Figure S1D).
Clonally expanded somatic mutations in tumor infiltrating immune cells
Clonally expanded somatic mutations were identified by the variant allele fraction (VAF) in matched tumor and blood sample. A lower threshold of VAF > 2% is generally accepted as indicative of clonal hematopoiesis 16. With this threshold the estimated false positive rate due to sequencing errors is < 1%30,31. This threshold also ensures that the identified mutations are not from circulating tumor DNA or cell free DNA, which can comprise up to 1% of blood sample DNA32,33. An upper threshold of VAF < 25% was used to select for somatic variants and exclude germline variants. Mutations that occur with 2% < VAF < 25% in both blood and tumor samples are indicative of CH mutations in TII cells. It is highly improbable that the same variant would arise independently in blood and tissue cells, therefore it is reasonable to assume that the mutation occurred in blood cells and was detected in tumor samples due to the presence of infiltrating immune cells in the tumor microenvironment. For example, the CH VAF in blood and tumor samples for the Core 1 Beta3-Galactosyltransferase-Specific Molecular Chaperone (C1GALT1C1) p.V276G variant, one of the eight potential immunosuppressive mutations in TII cells (Table 1), varies from 7.1–23.5% for tumor and 4.1–21.6% for blood samples (Fig. 3A). The VAF for all 8 mutations are summarized in Fig. 3B and in Table 1. Of the 558,470 protein altering mutations, 47,255 (8.5%) were clonally expanded somatic mutations in TII cells based on the above criteria (Fig. 2).
Likely pathogenic non-passenger mutations
To maximize the likelihood that the mutations identified are pathogenic and not passenger mutations, we selected mutations that satisfied the following additional criteria: rare mutations that frequently occurred in BRCA samples, mutations in genes that are expressed in tumor infiltrating immune cells based on scRNA-seq data, and deleterious or damaging mutations. Again, we emphasize that although the goal is to filter out passenger mutations, the mutations identified are not “driver” mutations. Since the mutations are in immune cells and not in tumor tissue cells, they will not directly drive tumor growth. Instead, the mutations identified here could “facilitate” tumor growth by inhibiting an effective anti-tumor immune response.
The genome aggregation database (gnomAD) includes variant information from over 100,000 exome sequences. Rare variants in gnomAD (allele population frequency < 0.01%) suggests that the variant may have a deleterious effect on gene function. In tumor cells, rare mutations can be a result of rapidly dividing tumor cells with defective DNA repair mechanisms. Many of these mutations may be passenger mutations, with no effect on tumor progression. However, non-tumor immune cells generally do not contain defective DNA repair mechanisms and are not rapidly dividing. Therefore, the relatively frequent occurrence (> 5%) of such rare mutations in BRCA blood samples, suggests that the mutation could affect cell function34. For example, the p.V276G variant in C1GALT1C1 occurs in 75 (7%) of 1,064 matched BRCA normal blood samples (Fig. 4A), compared to 2 (0.0019%) of 105,263 sequences in gnomAD. The frequency of the eight potential immunosuppressive CH mutations in TII cells range from 7.05 – 11.56% in BRCA normal blood samples (Fig. 4D), compared to the 0 – 0.002% in gnomAD sequences (Table 1). As noted in the Methods section, we used protected TCGA data which includes mutations in blood samples (Fig. 4A) that would be excluded in open access databases since they are considered potential germline mutations that may identify the sample donor (Fig. 4C). For this analysis we selected CH mutation in TII cells (Fig. 4B). Figure 4A mutations not included in Fig. 4B,C are non-CH variants in TII (VAF ≤ 2%) or potential germline mutations (VAF ≥ 25%).
As an additional filter for potential passenger mutations, we exclude mutations in genes that are not expressed in immune cells since these will not affect immune function. We used scRNA-seq data from 100,064 cells from the breast cancer tumor microenvironment (Fig. 5A)28 to exclude mutations in genes not expressed in immune cells. The number of immune cells in which the genes in Table 1 are expressed, range from 603 for KIF15 to 15,806 for UBE2N (Fig. 5B). These genes were expressed primarily in T-cells (55.3%) and myeloid cells (35.1%) (Fig. 5B).
We also excluded mutations in genes that are subject to hypermutation or extreme variability as part of the natural immune response or due to adaptations to different antigens. These include immunoglobulin, immunoglobulin-like receptor, histocompatibility antigen, and T-cell receptor genes.
Lastly, most secondary CH mutations (Fig. 1) are not likely to have a functional effect on the immune cell, even if protein-altering. For example, the effect of a single missense mutation or an in-frame insertion/deletion in an unstructured region of the protein will not always affect the structure or function of the protein. We used two mutation-significance prediction tools – sorting intolerant from tolerant (SIFT)35 and PolyPhen236 – to select mutations that were predicted to be “deleterious” or “damaging” by both tools. SIFT predictions are primarily based on the degree of protein sequence conservation across homologous proteins. In addition to sequence conservation, PolyPhen2 incorporates the potential effect of the mutation on protein structure. Truncating and frame-shift mutations were also considered to be deleterious (SI Table S3). The pathogenicity prediction of SIFT and PolyPhen2 were consistent with predictions by CADD, a tool that integrates the annotations by multiple other predictors (SI Table S3)37. Gene pathway enrichment analysis using the Reactome pathway analysis tool38 identified 32 significantly over-represented pathways (false discovery rate (FDR) < 0.05). Of the 32 pathways, 17 were part of the Disease top-level pathway (SI Table S4).
Together, the above considerations maximize the likelihood that the selected mutations are pathogenic and not passenger mutations. Of the 47,255 CH mutations in TII, 1,710 (3.6%) frequently occurred in BRCA samples (> 5% of samples). Of these 1,710 mutations, 567 (33.2%) were rare mutations with gnomeAD allele frequency < 0.01%. Of these 567 mutations, 384 (67.7%) did not occur in genes that are naturally hypermutated or highly variable. Finally, of these 384 mutations, 95 (24.7%) were predicted to be deleterious or damaging by both SIFT and PolyPhen2 (SI Table S3) The cancer subtype, stage, and age distribution of these mutations (SI Figure S2) is similar to the overall distribution for all samples (SI Figure S1).
Potential immunosuppressive mutations
Although the pathogenic variants identified above may affect cellular function, they may not necessarily affect anti-tumor immune response. To identify mutations that are likely to affect anti-tumor immune response, we considered two factors. First, we conducted a literature review to determine if there is in vivo or in vitro experimental evidence showing that the gene associated with each variant is involved in immune response. Second, we analyzed differentially expressed genes to confirm that immune system pathways could be affected by the variant. Eight potentially immunosuppressive CH mutations in TII cells were identified (Table 1). Studies suggesting an immunosuppressive effect of mutations in these genes are summarized below.
Core 1 Beta1,3-Galactosyltransferase 2 (C1GALT1C1, COSMC) encodes a chaperone protein required for the proper folding of T-synthase. T-synthase is required for the complete glycosylation of membrane glycoprotein Core 1 O-Glycan (T antigen)39. Incomplete glycosylation of T antigen results in the Tn antigen which has been associated with Tn syndrome, an autoimmune disease, and an immune suppressive microenvironment in colorectal cancer40. An analysis of blood cells from two patients with Tn syndrome and 25 healthy donors showed that the Tn syndrome was likely a result of somatic variants in COSMC which caused it to lose its chaperone function resulting in Tn antigens41. An in vitro study using the Jurkat cell line (T-lymphoblast) also showed that variants in COSMC results in the Tn positive phenotype42.
The dipeptidyl-peptidase IV (DPP4) is a protein found in the extracellular domain of CD26 that acts by cleaving N-terminal proline or alanine dipeptides at position two. CD26 is expressed as either a membrane-bound form that is typically expressed on CD4 + helper/memory T cells, or as a soluble form (sCD26) found in serum43. CD26 was also associated and co-expressed with adenosine deaminase (ADA) in Jurkat T cell lines, verified with in vitro binding assays demonstrating its binding via the extracellular domain of CD26. ADA deficiency causes severe combined immunodeficiency disease (SCID) in humans and demonstrates one of the immunopathological roles of DPP444. DPP4 also plays a role in fibrosis and immunoregulation and has received increasing attention in autoimmune diseases such as systemic lupus erythematosus (SLE) in which clinical evidence showed increased CD26 mRNA in SLE patients by 3.6-fold compared to controls45.
Eukaryotic Translation Initiation Factor 4E Binding Protein 1 (EIF4EBP1, 4E-BP1) encodes a translation repressor protein that directly interacts with eukaryotic translation initiation factor 4E (eIF4E). eIF4E is a component of a complex that recruits 40S ribosomal subunits to the 5’ end of mRNAs specific to monocytes. Dephosphorylation of EIF4EBP1 leads to its interaction with eIF4E resulting in the inhibition of cap-dependent translation both in vivo and in vitro, playing a key role in human cancer46. EIF4EBP1 phosphorylation was shown to regulate protein synthesis required for T-cell proliferation47. Immunohistochemical analysis showed increase expression of EIF4EBP1 in subtypes of B-cell lymphoma and reactive lymphoid tissue48. Additionally, EIF4EBP1 has been suggested to have a positive regulatory effect on autophagy through its regulation of mammalian target of rapamycin complex 1 (mTORC1). A study transfected a miR-99a-3p antagomir leading to negative regulation of autophagy. Thus, it not only plays a role in cancer, but due to its relationship with miR-99a-3p has a role in autoimmune diseases such as in SLE49.
Interestingly, four separate mutations in EIF4EBP1 were included in our list of eight potentially immunosuppressive mutations. In addition, there was considerable overlap in the occurrence of EIF4EBP1 mutations with 134 of 173 samples (77.46%) containing two or more of the four mutations. Since all these variants were detected in both tumor and matched blood samples with log odds (LOD) accuracy > 4.0, they are unlikely to be artifacts. For example, a sample with both the R63W and S65L variants had LODs of 44.02 and 54.99 for the two variants. Instead, it was more likely that the frequent occurrence of these mutations is because loci for two of the variants (R56W and R63W) contain CpG dinucleotides. Mutations in CpG dinucleotides are an order of magnitude more frequent than at other sites50. In addition, these four mutations are in the eIF4E binding site (residues 51–67)51. Phosphorylation of the protein in response to hormone signaling initiates mRNA translation. These mutations could disrupt mRNA translation, potentially affecting anti-tumor immune activity of the associated cells. Therefore, we speculate that the occurrence of these mutations may be correlated with the occurrence of cancer.
The protein expressed by the Kinesin Family Member 15 (KIF15, NY-BR-62) gene is part of a family of proteins that transport various cellular components such as organelles, protein complexes, and mRNA along microtubules. This protein has been implicated in the progression of various cancer types, including breast cancer52. One study found that KIF15 was primarily expressed in inflammatory monocytes in the tumor microenvironment and was a prognostic marker for hepatocellular carcinoma53. Expression of KIF15 was also found to increase B-cell proliferation in Burkitt lymphoma54.
Ubiquitination requires a ubiquitin activating E1 enzyme, an E2 ubiquitin conjugase, and an E3 ubiquitin ligase55. UBE2N is a K63-Ub-specific E2 enzyme that has been investigated for its role as a growth promoter of several human cancers such as breast cancer and neuroblastoma56,57. Gene expression analysis in human acute myeloid leukemia, implicated UBE2N as necessary for maintaining oncogenic immune signaling states. Suppression of UBE2N decreased oncogenic immune signaling, promoting cell death of leukemic hematopoietic stem and progenitor cells (HSPC) and ensured normal hematopoiesis58. UBE2N was also found to be essential for RIG-I mediated immune signaling in response to viral infection59.
Based on in vitro or in vivo experimental evidence described above, eight (5.3%) of the 95 potential pathogenic CH mutations in TII cells were identified as potentially affecting immune response (Table 1). As further support for a potential immune system related effect, RNA sequencing data for the BRCA samples were used to identify differentially expressed genes between samples with and without each mutation. For the significantly differentially expressed (SDE) genes (false discovery rate < 0.05), we then identified the associated top-level pathways in the Reactome pathways database38. A number of the SDE genes for each of the mutations were associated with immune system pathways, providing further support for an immunosuppressive role for these mutations (Table 2).