Thursday, September 21, 2023

Machine learning‑based prediction of survival prognosis in esophageal squamous cell carcinoma – Scientific Reports

Clinicopathological characteristics

2441 ESCC patients were enrolled according to inclusion and exclusion criteria. 1954 patients were assigned to the training cohort and 487 patients were assigned to the validation cohort (Table 1). The median age of included patients was 62.0 years old (range, 34–90 years), and most patients were males (81.6%). The median follow-up time of OS was 28.23 months (range,6.10–115.3 months).

Table 1 Baseline features of included cohorts in different data sets.

Model development of machine learning

To prevent overfitting or uncertainty in the model, we first examined the correlation between continuous variables by spearman method before developing the model. We observed a slight collinearity problem between variables, as shown in Figure S1. We then utilized LASSO regression to penalize and select the optimal features, removing less important features from the model and reducing the correlation between variables. Ultimately, 22 variables were selected for model building with an optimal lambda.min of 0.00805, as shown in Fig. 1. Subsequent univariate COX regression analysis identified 14 significant factors for predicting patients’ overall survival, including sex, KPS score, tumor length, tumor grade, surgical margin, vascular invasion, nerve invasion, T stage, N stage, MPV, AST, Na, Mg, and FIB (Table S2). Therefore, these 14 variables were selected for subsequent model development.

Figure 1

Feature selection of the patient’s indicators by the LASSO regularization: (A) The relationship between LASSO penalty and regression coefficient change; (B) Cross-validation plot of partial-likelihood deviance curve with Log(λ) value in feature selection; (C) The coefficients of feature parameter estimation in the LASSO regularization; (D) Variable correlation plot of clinical features in the LASSO regression algorithm.

Six different survival analysis algorithms were utilized to model development in the training set. The hyperparametric search space and tuning results were given in Table S1. The discriminative performance of the developed models was evaluated by the average C-index using grid search with fivefold cross-validation repeated 20 times. The results were presented in Fig. 2 and Table 2, which demonstrate that the machine learning-extended CoxPH model, Elastic Net, and Random Forest exhibit similar performance in model cross-validation, with a C-index of 0.731. Furthermore, their prediction performance is superior to that of GBM, GLMboost, and Rpart. Considering the importance of model interpretability, we ultimately selected the classical algorithm of CoxPH regression as our final method for further study.

Figure 2
figure 2

Prediction performance for the six-survival analyzing algorithm. (A) The c-index value was computed for each method using nested 5 × 20 cross-validations. (B) The confidence interval of the c-index value for each method using nested 5 × 20 cross-validations.

Table 2 Prediction performance of the machine learning methods.

Next, we utilized permutation importance method to calculate the ranked importance of 14 variables that were selected from the univariate Cox regression analysis, and the results are presented in Fig. 3. N stage, T stage, surgical margin, MPV, and AST were identified as the top 5 important predictors for predicting survival events. The optimal model features were extracted after tuning the model parameters with tenfold cross-validation resampling using the sequential backward search method. The final 10 features selected for CoxPH model building were N stage, T stage, surgical margin, MPV, AST, tumor grade, sex, FIB, tumor length, and Mg.

Figure 3
figure 3

The ranked importance of the candidate variables.

To estimate the impact of each predictor on mortality risk in the CoxPH model, we display the marginal effects of each factor in Figure S2. Our results demonstrate that T stages and N stages are significant risk factors in the CoxPH model, with the risk of mortality increasing with higher T and N stages. Females exhibit a lower risk of mortality than males. Positive surgical margins and poorly tumor grade increase the risk of mortality. Additionally, lower levels of MPV and Mg and higher levels of tumor length, AST, and FIB are associated with a greater risk of mortality in the model.

Machine learning model performance

With 10 prognostic features, patients were stratified into estimated risk deciles. We observed similar survival distributions for three risk scores and stratified the deciles of event probability into low, intermediate, and high-risk groups based on the related risks. The first to fourth deciles were classified as low-risk subgroups, with the percentage of observed death being significantly less than 25%. The eighth to tenth deciles were classified as high-risk subgroups, with the percentage of observed death exceeding 50%. The remaining groups were stratified into intermediate-risk groups (fifth to seventh deciles) (Fig. 4A,B).

Figure 4
figure 4

The survival prediction performance of machine learning-extended CoxPH model. (A) The percent of observed death according to deciles of event probability. (B) Three risk groups were stratified by similar patterns of survival distribution. Kaplan–Meier curves estimated the survival probabilities in the training (C) and validation (D) cohorts. Time ROC curves compared the performance of the risk mode at 1,3 and 5-year follow-up time in the training (E) and validation(F) cohorts.

Kaplan–Meier curve plots of survival probabilities revealed significant differences in survival rates among the high-, intermediate-, and low-risk subgroups in both the training and validation cohorts (Fig. 4C,D, all p < 0.0001). The risk stratification predicted 3-year overall survival probabilities of 80.8%, 58.2%, and 29.5% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 75.4%, 48.8%, and 26.9% in the validation cohort. In addition, the risk stratification predicted 5-year overall survival probabilities of 70.6%, 45.6%, and 18.7% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 65.3%, 27.9%, and 11.0% in the validation cohort (Table 3). The AUC values for 1-, 3-, and 5-year overall survival were 0.760, 0.735, and 0.746 in the training cohort, respectively, and a similar discriminative performance was observed in the validation cohort with AUC values of 0.725, 0.720, and 0.752 for 1-, 3-, and 5-year overall survival, respectively (Fig. 4E,F).

Table 3 3,5-year OS survival probability of CoxPH model-based risk stratification in training and validation cohorts.

We further evaluated the performance of the risk model by selecting the top 5 most important features (N stage, T stage, surgical margin, MPV, AST) from the permutation importance results for model development. Our findings demonstrate that the CoxPH risk model exhibits a significant advantage over the combination of these top 5 features, as well as individual features such as N stage (0.681), T stage (0.642), surgical margin (0.535), MPV (0.576), and AST (0.519) (Fig. 5).

Figure 5
figure 5

ROC curves to evaluate the capability of risk models and other indicators for ESCC patients’ survival prediction.

Machine learning model evaluation

The machine learning-extended CoxPH risk model exhibits excellent predictive performance for survival events. However, it remains unclear whether the model can be utilized in clinical practice. Therefore, we compared the c-index values between the risk model and the AJCC8th stage using fivefold cross-validation with 200 repeats. Additionally, we employed calibration plots and DCA curves to evaluate the clinical utility of the model. Our results demonstrate that the risk model exhibits superior discriminative ability and net benefit over the AJCC8th stage for all patients in both the training and validation cohorts (Fig. 6). The calibration curve revealed a good agreement between predictions and actual observations for the probability of 1-, 3-, and 5-year survival (Fig. 7).

Figure 6
figure 6

The C-index and decision curve analyses were performed to compare the performance between the risk score and the AJCC8th stage. The c-index values of the risk score and AJCC8th stage in training (A) and validation (B) cohorts by using fivefold cross-validation with 200 repeats; The net benefit of the risk model and AJCC8th stage in training (C) and validation (D) cohorts by using decision curve analyses.

Figure 7
figure 7

The calibration curve for predicting patient survival at 1 years (A), 3 years (B), and 5 years (C) in the training cohort and at 1 years (D), 3 years (E), and 5 years (F) in the validation cohort.

The influence of treatment option on the model

In general, treatment options can impact the overall survival rate of patients. To clarify the impact of different treatment modalities on the overall survival of patients with ESCC, we evaluated the overall survival outcomes of different treatment subgroups among surgical intervention alone, CT, RT and CCRT treatment patients. However, we found no significant differences in the overall survival rates among the different treatment subgroups (Figure S3). In addition, we further evaluated the survival outcomes of ESCC patients who received surgical intervention alone, and found that the overall survival rate of ESCC patients who underwent endoscopic treatment was higher than those who underwent thoracotomy surgical resection (Figure S4). Furthermore, we also investigated the impact of chemotherapy on the overall survival of ESCC patients who underwent surgery, and found no significant differences in the overall survival rates among the different chemotherapy subgroups (Figure S5). These results suggest that ESCC patients who underwent endoscopic treatment may be in earlier stages of the tumor or have milder symptoms, while those requiring thoracotomy patients may be in advanced stages of the tumor. The patients who received thoracotomy may benefit from adjuvant radiotherapy or chemotherapy to improve their overall survival outcomes, achieving similar results as surgical intervention alone.

Source link

Related Articles

Leave a Reply

Stay Connected

- Advertisement -spot_img

Latest Articles

%d bloggers like this: