2441 ESCC patients were enrolled according to inclusion and exclusion criteria. 1954 patients were assigned to the training cohort and 487 patients were assigned to the validation cohort (Table 1). The median age of included patients was 62.0 years old (range, 34–90 years), and most patients were males (81.6%). The median follow-up time of OS was 28.23 months (range,6.10–115.3 months).
Model development of machine learning
To prevent overfitting or uncertainty in the model, we first examined the correlation between continuous variables by spearman method before developing the model. We observed a slight collinearity problem between variables, as shown in Figure S1. We then utilized LASSO regression to penalize and select the optimal features, removing less important features from the model and reducing the correlation between variables. Ultimately, 22 variables were selected for model building with an optimal lambda.min of 0.00805, as shown in Fig. 1. Subsequent univariate COX regression analysis identified 14 significant factors for predicting patients’ overall survival, including sex, KPS score, tumor length, tumor grade, surgical margin, vascular invasion, nerve invasion, T stage, N stage, MPV, AST, Na, Mg, and FIB (Table S2). Therefore, these 14 variables were selected for subsequent model development.
Six different survival analysis algorithms were utilized to model development in the training set. The hyperparametric search space and tuning results were given in Table S1. The discriminative performance of the developed models was evaluated by the average C-index using grid search with fivefold cross-validation repeated 20 times. The results were presented in Fig. 2 and Table 2, which demonstrate that the machine learning-extended CoxPH model, Elastic Net, and Random Forest exhibit similar performance in model cross-validation, with a C-index of 0.731. Furthermore, their prediction performance is superior to that of GBM, GLMboost, and Rpart. Considering the importance of model interpretability, we ultimately selected the classical algorithm of CoxPH regression as our final method for further study.
Next, we utilized permutation importance method to calculate the ranked importance of 14 variables that were selected from the univariate Cox regression analysis, and the results are presented in Fig. 3. N stage, T stage, surgical margin, MPV, and AST were identified as the top 5 important predictors for predicting survival events. The optimal model features were extracted after tuning the model parameters with tenfold cross-validation resampling using the sequential backward search method. The final 10 features selected for CoxPH model building were N stage, T stage, surgical margin, MPV, AST, tumor grade, sex, FIB, tumor length, and Mg.
To estimate the impact of each predictor on mortality risk in the CoxPH model, we display the marginal effects of each factor in Figure S2. Our results demonstrate that T stages and N stages are significant risk factors in the CoxPH model, with the risk of mortality increasing with higher T and N stages. Females exhibit a lower risk of mortality than males. Positive surgical margins and poorly tumor grade increase the risk of mortality. Additionally, lower levels of MPV and Mg and higher levels of tumor length, AST, and FIB are associated with a greater risk of mortality in the model.
Machine learning model performance
With 10 prognostic features, patients were stratified into estimated risk deciles. We observed similar survival distributions for three risk scores and stratified the deciles of event probability into low, intermediate, and high-risk groups based on the related risks. The first to fourth deciles were classified as low-risk subgroups, with the percentage of observed death being significantly less than 25%. The eighth to tenth deciles were classified as high-risk subgroups, with the percentage of observed death exceeding 50%. The remaining groups were stratified into intermediate-risk groups (fifth to seventh deciles) (Fig. 4A,B).
Kaplan–Meier curve plots of survival probabilities revealed significant differences in survival rates among the high-, intermediate-, and low-risk subgroups in both the training and validation cohorts (Fig. 4C,D, all p < 0.0001). The risk stratification predicted 3-year overall survival probabilities of 80.8%, 58.2%, and 29.5% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 75.4%, 48.8%, and 26.9% in the validation cohort. In addition, the risk stratification predicted 5-year overall survival probabilities of 70.6%, 45.6%, and 18.7% for low-, intermediate-, and high-risk subgroups, respectively, in the training cohort, and 65.3%, 27.9%, and 11.0% in the validation cohort (Table 3). The AUC values for 1-, 3-, and 5-year overall survival were 0.760, 0.735, and 0.746 in the training cohort, respectively, and a similar discriminative performance was observed in the validation cohort with AUC values of 0.725, 0.720, and 0.752 for 1-, 3-, and 5-year overall survival, respectively (Fig. 4E,F).
We further evaluated the performance of the risk model by selecting the top 5 most important features (N stage, T stage, surgical margin, MPV, AST) from the permutation importance results for model development. Our findings demonstrate that the CoxPH risk model exhibits a significant advantage over the combination of these top 5 features, as well as individual features such as N stage (0.681), T stage (0.642), surgical margin (0.535), MPV (0.576), and AST (0.519) (Fig. 5).
Machine learning model evaluation
The machine learning-extended CoxPH risk model exhibits excellent predictive performance for survival events. However, it remains unclear whether the model can be utilized in clinical practice. Therefore, we compared the c-index values between the risk model and the AJCC8th stage using fivefold cross-validation with 200 repeats. Additionally, we employed calibration plots and DCA curves to evaluate the clinical utility of the model. Our results demonstrate that the risk model exhibits superior discriminative ability and net benefit over the AJCC8th stage for all patients in both the training and validation cohorts (Fig. 6). The calibration curve revealed a good agreement between predictions and actual observations for the probability of 1-, 3-, and 5-year survival (Fig. 7).
The influence of treatment option on the model
In general, treatment options can impact the overall survival rate of patients. To clarify the impact of different treatment modalities on the overall survival of patients with ESCC, we evaluated the overall survival outcomes of different treatment subgroups among surgical intervention alone, CT, RT and CCRT treatment patients. However, we found no significant differences in the overall survival rates among the different treatment subgroups (Figure S3). In addition, we further evaluated the survival outcomes of ESCC patients who received surgical intervention alone, and found that the overall survival rate of ESCC patients who underwent endoscopic treatment was higher than those who underwent thoracotomy surgical resection (Figure S4). Furthermore, we also investigated the impact of chemotherapy on the overall survival of ESCC patients who underwent surgery, and found no significant differences in the overall survival rates among the different chemotherapy subgroups (Figure S5). These results suggest that ESCC patients who underwent endoscopic treatment may be in earlier stages of the tumor or have milder symptoms, while those requiring thoracotomy patients may be in advanced stages of the tumor. The patients who received thoracotomy may benefit from adjuvant radiotherapy or chemotherapy to improve their overall survival outcomes, achieving similar results as surgical intervention alone.