Tuesday, May 30, 2023

Regulation of early diagnosis and prognostic markers of lung adenocarcinoma in immunity and hypoxia – Scientific Reports

Determine the most significant independent prognostic clinical factors

In univariate and multivariate analysis, we found that only the p values of TMN Stage were all less than 0.05 (Fig. 1A, B). The AUC values of various clinicopathological factors (age, gender, TMN stage, T stage, M stage, and N stage) were 0.471, 0.558, 0.671, 0.576, 0.496, and 0.636, respectively (Fig. 1C), and the AUC value of TMN Stage was the highest. They are divided into two groups according to the TMN StageI and TMN StageII-IV. The survival rates of the two groups were significantly different, the prognosis of TMN StageII-IV was worse than that of other groups (p < 0.001) (Fig. 1D).

Figure 1

Screening early diagnostic genes related to TMNStage. (A, B), Forest plots showed univariate (A) and multivariate Cox (B) analysis of clinical characteristics. (C) ROC curve plots of clinical features. (D) Survival analysis between stage I and stage II-IV groups. (E, F) Heat, and (F) volcano map of differentially expressed genes between StageI and normal groups from the TCGA database. (G) Based on the Cptac database, the heat map showed the differentially expressed proteins between the normal and tumor group. (H) Based on the TCGA database, Heatmap showed the differentially expressed genes between StageI and II-IV. (I) Wayne’s diagram indicates that early diagnosis genes are screened via the overlaps of differential genes of the three groups.

Screening early diagnostic genes

Based on the normal and Stage groups in TCGA data, we used the Wilcoxon test method in R language to screen 7333 significant genes (Fig. 1E,F), and then differentially expressed between StageI and StageII-IV groups to obtain 296 significant genes (Fig. 1H). We also used the limma package in R language to analyze the differences between normal and tumor groups in the CPTAC database, To obtain 2234 significant genes (Fig. 1G). Finally, 38 early diagnosis genes (edgs) were obtained by the intersection of the above three groups of significant genes (Fig. 1I).

Screening early diagnosis and prognosis genes and establishing an early prognosis model

Firstly, 24 prognostic related edgs (edgps) (Fig. 2A) were screened by univariate Cox regression analysis, then, three early diagnostic markers (risk genes) with the greatest impact on prognosis edgps (Fig. 2B–D) were screened and calculated as risk scores based on lasso Cox regression dimensionality reduction and multivariate Cox regression analysis, to establish early prognostic risk model (edgpsl)16. The calculation formula of risk score is as follows: risk score = (−0.09 × Expression CYP4B1) + (0.1238 × Expression value of KRT6A) + (0.1022 × Expression value of FAM83A). Through the “survival” package in R language, the best threshold of clinical prognosis is selected from the risk score, which is used as the basis for the high-risk and low-risk groups. Through KM survival analysis, the study found that the prognosis of high-risk groups is very low, compared with the low-risk group in TCGA (Fig. 2E) and GEO (Fig. 2G) data sets. The AUC value of the three-year prognosis model of the two data sets (Fig. 2F,H) (TCGA: IYAUC = 0.711, 2YAUC = 0.682, 3YAUC = 0.680, GEO: IYAUC = 0.785, 2YAUC = 0.672, 3YAUC = 0.639) were analyzed based on the ROC curve in R soft.

Figure 2
figure 2

Establish and verify the early diagnosis and prognosis model. (A) Univariate Cox survival analysis was performed for the early diagnosis genes. (B) A 1000-fold cross-validation for tuning parameter selection in the least absolute shrinkage and selection operator (LASSO) model. (C) LASSO coefficient profiles of the 24 early diagnosis and prognosis-related genes. (D) Multivariate Cox regression of three risk genes. E, G, KM survival analysis was performed to determine survival differences between different analysis groups, from TCGA and GEO (G) data sets. (F, H) The ROC curve revealed the AUC value of the prognostic model in the TCGA (F) and GEO (H) cohort.

Correlation between risk score and clinical features

We found that the risk scores were mainly distributed in the T2-4 Stage (Fig. 3D), N1-3 Stage (Fig. 3F), and TMN StageII-IV (Fig. 3C). However, there were no significant differences in the distribution of risk scores among age (Fig. 3A), gender (Fig. 3B), and M groups (Fig. 3E), compared with other clinical Stages.

Figure 3
figure 3

The relationships between risk score and multiple clinicopathological parameters. (A) Age. (B) Gender. (C) TMNStage. (D) Tumor size (T). (E) Distant metastasis (M). (F) Lymph node status (N).

Distribution of immune cell infiltration under risk model

Through this study, we found that 10 immune cells were closely related to the risk score (Fig. 4B), including mast cell activated (cor = −0.34360408), T cell CD4 + memory resetting (cor = −0.226), Myeloid dendritic cell resting (cor = −0.185), B cell memory (cor = −0.174), Monocyte (cor = −0.16), Neutrophil (cor = 0.218), Mast cell resting (cor = 0.23), Macrophage M0 (cor = 0.239), NK cell resting(cor = 0.12) and Macrophage M1 (cor = 0.136). Then, by analyzing the difference of immune cell composition between high and low-risk groups (Fig. 4A), we found that NK cell resting Macrophage M1, Neutrophil, Mast cell resting, macrophage M0 and T cell CD4 + memory activated were mainly distributed in the high-risk group, while the immune cells mainly distributed in the low-risk group were mast cell activated, T cell CD4 + memory resting, myeloid dendritic cell resting, B cell memory, and Monocyte. Finally, compared with the low-risk group, the expression of immune-related genes programmed death-ligand 1 (PD-L1) and programmed cell death protein 1(PD1) were higher in the high-risk group (Fig. 4C). In the low-risk group, the expression levels of interleukin-4 (IL-4) and surface molecule CD20 (CD20) were higher than those in the high-risk group (Fig. 4D).

Figure 4
figure 4

Compositions of infiltrated immune cells between different risk groups in TCGA-LUAD cohort. (A) The box plot shows the ratio differentiation of 22 kinds of immune cells between LUAD tumor samples with the high/low risk groups. (B) The radar chart shows the correlation between risk score and immune cell score. (C, D) Violin diagram shows the distribution level of immune related genes in high and low expression groups.

Hypoxia effects in the risk model

Based on the HALLMARK data set in the GSEA tool, we analyzed the functional enrichment of high and low-risk groups under the risk model. The high-risk group was mainly enriched in hypoxia, glycolysis, and PI3K-Akt-mTOR signal pathway (Fig. 5A–C). The mutation load of the high-risk group was higher than that of the low-risk group (Fig. 5D). At the same time, hypoxia-inducible factor A (HIF1A) and lactate dehydrogenase A (LDHA) were higher in the high-risk group than in other groups (Fig. 5E). In the high HIF1A group, the expression levels of FAM83a and KRT6A were higher, while CYP4B1 was the opposite (Fig. 5F). In addition, FAM83A and KRT6A were positively correlated with HIF1A, while CYP4B1 was contrary (Fig. 5G).

Figure 5
figure 5

Effects of hypoxia in high and low-risk groups. (AC), GSEA analysis showed that the high-risk group was enriched in hypoxia (A), glycolysis (B), and PI3K/Akt/mTOR (C) signaling pathway. (D) Tumor mutation load in different risk groups. (E) Expression levels of hypoxia-inducible factor HIF1A and hypoxia regulatory factor LDHA. (F) Distribution level of risk genes in high and low HIF1A expression groups. (G) Correlation between risk genes and HIF1A.

Establish and verify nomogram based on TMN Stage and risk score

Univariate and multivariate Cox regression analysis of multiple clinical factors and risk scores showed that TMN Stage and risk score had a significant impact on the prognosis of patients. The results of univariate and multivariate analysis showed that TMN Stage (UNIX: HR = 2.918, 95% Cl = 1.989–4.283, p < 0.001. Mutiox: HR = 4.187, 95% Cl = 1.627–10.776, p = 0.003) (Fig. 6A) and risk score (UNIX: HR = 1.682, 95% Cl = 1.354–2.689, p < 0.001. Mutiox: HR = 1.611, 95% Cl = 1.275–2.035, p < 0.001) (Fig. 6B). AUC values of various clinical factors and risk scores were analyzed in the “TIMER ROC” package in R language, among which TMN Stage (AUC = 0.669) and risk score (AUC = 0.711) is the highest (Fig. 6C). The nomogram was established based on TMN Stage and risk score to predict the three-year survival rate of patients (Fig. 6D), and the range of C index of 95% confidence interval was 0.6598 to 0.7292. The deviation correction line was consistent with the calibration curve, indicating that the nomogram had high prediction ability (Fig. 6E).

Figure 6
figure 6

Establishment and evaluation of predictive nomogram in early diagnosis and prognosis model. (A, B), Forest map showed that independent clinical prognostic factors were obtained by univariate (A) and multivariate (B) Cox analysis. (C) The ROC curve exhibited the predictive performance of each independent predictive factor. (D) A nomogram for predicting OS in significantly independent prognostic factors. (E) Calibration plot of the nomogram for the probability of OS at 1, 2 and 3 year.

Functional enrichment analysis of KEGG and GO related to different risk groups

Based on the KEGG data set in GSEA, we found that the low-risk group is mainly enriched in α-Linolenic acid metabolism, arachidonic acid metabolism, and Linoleic acid metabolism (Fig. 7B). In contrast, high-risk groups were primarily increased in bladder cancer, cell cycle, replication of bladder, pancreatic cancer, renal cell carcinoma, and small cell carcinoma signal pathway (Fig. 7A). GO function enrichment analysis was conducted through the DAVID online website for significant genes in high and low-risk groups. The highly expressed genes in high-risk groups were mainly enriched in skin development, keratinocyte differentiation, epithelial cell differentiation, epidermis development, cornification, intermediate filing, intermediate filing cytoskeleton, keratin filing, and structural constitution cytoskeleton, certified envelope, and calcium-dependent protein binding (Fig. 7C). The significant genes in the low expression group were mainly enriched in the receptor-mediated endocytosissteroid metabolic process, alcohol metabolic process, terpenoid metabolic process, antimicrobial humoral response, blood microparticle, endocytic vesicle, secretory granule lumen, cytoplasmic vesicle lumen, and vesicle lumen (Fig. 7D).

Figure 7
figure 7

Functional enrichment analysis of KEGG and GO in high and low risk groups. (A, B), Gene set enrichment analysis (GSEA) results show the enriched KEGG pathways in the high (A) and low (B) risk groups. (C, D), Bubbles (C) and bar (D) plots show enriched go pathways based on up-regulated gene analysis in high-risk and low-risk groups, respectively.

Univariate Cox regression was used to analyze the survival results of OS, DSS, DFI, and PFI related to early prognostic genes

For the univariate cox analysis of early prognostic genes (FAM83A, KRT6A, and CYP4B1) related to OS, DSS, DFI, and PFI of 32 tumors, we selected the top one to three with the lowest p-value in the analysis of tumor prognosis. Firstly, in the analysis of overall survival (OS), FAM83A (p = 2.06e-07), KRT6A (p = 1.22e-07), and CYP4B1 (p = 6.72e-05) had the most significant effect on the overall survival of patients with lung adenocarcinoma. Patients with high expression of FAM83A (HR = 1.283,95% CI = 1.168–1.410) and KRT6A (HR = 1.156, 95% CI = 1.097–1.223) had lower survival, while CYP4B1 (HR = 0.866,95% CI = 0.806–0.929) had the opposite effect. In the prognostic analysis of disease-free interval (DFI), CYP4B1 (p = 0.003) and FAM83A (p = 0.003) had the highest effect on the disease-free progression of lung adenocarcinoma, while KRT6A (p = 0.046) had a higher effect on BRCA. High expression of KRT6A (HR = 1.108, 95%CI = 1.004–1.223) and FAM83A (HR = 1.227, 95%CI = 1.074–1.402) had a poor prognostic progression of BRCA and lung adenocarcinoma, while CYP4B1 had the opposite effect on lung adenocarcinoma (HR = 0.860, 95%CI = 0.778–0.950). In the prognostic analysis of disease-specific survival (DSS), CYP4B1 (p = 0.000249), FAM83A (p = 5.69E-05), and KRT6A (p = 0.001235) had the highest effect on the disease-specific survival of lung adenocarcinoma patients. High expression of KRT6A (HR = 1.125, 95%CI = 1.047–1.207) and FAM83A (HR = 1.288, 95%CI = 1.138–1.457) was associated with poor DSS of lung adenocarcinoma, while CYP4B1 had the opposite effect on lung adenocarcinoma. In the prognostic analysis of progression-free interval (PFI), CYP4B1 (p = 6.40e-05) and KRT6A (p = 0.0111) had the highest impact on PFI in patients with lung adenocarcinoma, and FAM83A had a higher impact on PFI in KIRC (p = 0.0002052). High expression of KRT6A (HR = 1.075, 95% CI = 1.017–1.137) and FAM83A (HR = 2.436, 95% CI = 1.522–3.898) were associated with poor PFI of KIRC, while CYP4B1 (HR = 0.874,95% CI = 0.819–0.934) had the opposite effect on lung adenocarcinoma (Tables 1, 2, 3).

Table 1 Prognostic analysis of KRT6A.
Table 2 Prognostic analysis of FAM83A.
Table 3 Prognostic analysis of CYP4B1.

Compared with normal and StageI groups, KRT6A and FAM83A were higher in StageII-IV groups, while CYP4B1 expression levels were opposite (Fig. 8A–C). Similarly, in the analysis of the CPTAC dataset in the UALCAN database, the protein expression levels of KRT6A and FAM83A were higher in the tumor group, while CYP4B1 was the opposite (Fig. 8D–F). The mRNA and protein expression levels of FAM83A and KRT6A in tumor cells A549, PC9, and H1975 were higher than those of normal cells BSE-2B, while the mRNA and protein expression level of CYP4B1 in normal cells BSE-2B was higher than that of tumor cells H1299, PC9, A549 and H1975 (Fig. 9A–E) (*p < 0.05, **p < 0.01, ***p  < 0.001).

Figure 8
figure 8

Prognosis and expression results of early diagnosis and prognosis genes in lung adenocarcinoma and other tumors. (AC), CYP4B1 (A), FAM83A (B), and KRT6A (C) were expressed in the normal group, stage I, and stage II-IV groups. (DF), The expression levels of CYP4B1, FAM83A, and KRT6A in normal and tumor groups were analyzed by using the CPTAC data set in UNLCAN database.

Figure 9
figure 9

Expression levels of risk genes in normal and tumor cells. (AC), MRNA expression levels of KRT6A (A), FAM83A (B) and CYP4B1 (C) in normal and lung adenocarcinoma cells. (D, E), Protein expression level of risk genes.

Correlation between risk genes and HIF1A in a hypoxic environment

HIF1A increased significantly after CoCl2 induced A549 and PC9 cells. Meanwhile, in A549 cells, KRT6A and FAM83A increased with HIF1A. In PC9 cells, only FAM83A increased with HIF1A. The changing trend of CYP4B1 is not apparent (Fig. 10 A-B).

Figure 10
figure 10

(A, B) CoCl2 induced hypoxia and normal environment, the expression level of risk genes (KRT6A, FAM83A and CYP4B1) and HIF1A.

Source link

Related Articles

Leave a Reply

Stay Connected

- Advertisement -spot_img

Latest Articles

%d bloggers like this: