Study design and baseline information
The study design is described in Fig. 1. The baseline characteristics of the internal cohort, external cohort and prospective cohort are detailed in Table 1. The mean age of the entire cohort was 60.00 years and 48.61% (n = 1587) of the population were male. There were 2776 (85.02%) adenocarcinomas and 340 (10.41%) squamous cell carcinomas. The maximum standard uptake value (SUVmax), metabolic tumor volume (MTV), total lesion glycolysis (TLG) of the primary tumors were 5.43, 10.13 and 37.74, respectively. With respect to N status, 11.64% (n = 380) and 8.42% (n = 275) of patients were diagnosed as occult N1 and N2 diseases. In addition, compared to the internal cohort, patients in the external cohort were associated with significantly and older age (61.78 years versus 59.42 years, p < 0.001) and patients in the prospective cohort yielded an older age (60.46 years versus 59.42 years, p = 0.005), higher SUVmax of primary tumor (5.67 versus 5.25, p = 0.022) and larger tumor size (2.64 cm versus 2.53 cm, p = 0.030).
Variables associated with ONM
As displayed in Table 2, in the training set, a younger age (odds ratio [OR]: 0.967, 95% confidence interval [CI]: [0.951, 0.984], adjusted p < 0.001), pure solid type (OR: 2.525, 95% CI: [1.638, 3.891], adjusted p < 0.001), left location (OR: 1.512, 95% CI: [1.088, 2.100], adjusted p = 0.023), and central location (OR: 1.743, 95% CI: [1.202, 2.530], adjusted p = 0.007) were identified as independent predictors for occult N1 metastasis, and the pure solid type (OR: 3.389, 95% CI: [1.999, 5.745], adjusted p < 0.001) was independently related to occult N2 involvement. Most variables remained predictive for patients in the validation set, external cohort and prospective cohort (Supplementary Table 1). In addition, after incorporation of the DLNMS into analyses (Supplementary Table 2 & 3), the DLNMS was revealed as independent predictors for both occult N1 and N2 involvements.
Predictive performance of DLNMS
With an increase of DLNMS scores, more cases with occult N1 and N2 tumors were observed in the validation set (Supplementary Fig. 1A &B), external cohort (Supplementary Fig. 1C & D) and prospective cohort (Supplementary Fig. 1E & F). In addition, the DLMNS was represented by conventional PET and CT texture features in ONM prediction, implying the significant correlations between the DLNMS and PET/CT texture features (Fig. 2).
A Top 10 PET and (B) top 10 CT texture features related to the DLNMS N1 prediction in the training set. C Top 10 PET and (D) top 10 CT texture features related to the DLNMS N2 prediction in the training set. n = 1528 biologically independent samples were examined. Source data are provided as a Source Data file. DLNMS, deep learning nodal metastasis signature; PET, positron emission tomography; CT, computed tomography.
As illustrated in Fig. 3A and B, Table 3 and Supplementary Fig. 2, in the validation set, the abilities of the DLNMS to predict occult N1 and N2 diseases were shown to have areas under the receive operating characteristic curve (AUROCs) of 0.958 (95% CI: [0.923, 0.992]) and 0.942 (95% CI: [0.911, 0.973]), respectively, which were significantly better than 0.873 (95% CI: [0.835, 0.911]) and 0.761 (95% CI: [0.680, 0.842]) of the PET model, 0.913 (95% CI: [0.875, 0.952]) and 0.887 (95% CI: [0.823, 0.952]) of the CT model, 0.752 (95% CI: [0.685, 0.819]) and 0.690 (95% CI: [0.603, 0.776]) of the clinical model, 0.612 (95% CI: [0.536, 0.689]) and 0.672 (95% CI: [0.574, 0.771]) of the senior physicians, and 0.616 (95% CI: [0.544, 0.687]) and 0.556 (95% CI: [0.465, 0.647]) of the junior physicians (DeLong’s test: all p < 0.05). The areas under the precision-recall curve (AUPRC), sensitivity, specificity, positive predictive value (PPV), positive predictive value (NPV) and accuracy of the DLNMS for predicting occult N1 and N2 metastasis were 0.882, 0.898, 0.928, 0.647, 0.984 and 0.924, and 0.876, 0.897, 0.842, 0.317, 0.990, and 0.846, respectively.
ROC curves and performance metrics of models to predict occult N1 and N2 in the (A, B) validation set, C, D External cohort and (E, F) prospective cohort. ROC curves and performance metrics of the DLNMS to predict occult nodal metastasis in (G) adenocarcinoma and (H) squamous cell carcinoma for patients in validation set, external cohort and prospective cohort. n = 383, 355, and 999 biologically independent samples were examined for the validation set, external cohort, and prospective cohort, respectively. p values from Delong’s tests were adjusted by the Benjamini and Hochberg corrections for 5 multiple comparisons. Source data are provided as a Source Data file. ROC, Receiver operating characteristic curve; DLNMS, deep learning nodal metastasis signature; PPV, positive predictive value; NPV, negative predictive value; PET, positron emission tomography; CT, computed tomography.
In the external cohort (Fig. 3C, D), the DLNMS achieved AUROCs of 0.879 (95% CI: [0.813, 0.946]) and 0.875 (95% CI: [0.820, 0.930]) in predicting occult N1 and N2 metastasis, respectively, and were significantly superior than the PET model (0.790, 95% CI: [0.733, 0.847] and 0.727, 95% CI: [0.649, 0.805]), the CT model (0.826, 95% CI: [0.747, 0.905] and 0.817, 95% CI: [0.748, 0.887]), the clinical model (0.722, 95% CI: [0.642, 0.802] and 0.723, 95% CI: [0.648, 0.797]), the senior physicians (0.676, 95% CI: [0.590, 0.763] and 0.645, 95% CI: [0.554, 0.735]), and the junior physicians (0.633, 95% CI: [0.548, 0.719] and 0.594, 95% CI: [0.503, 0.685]) (DeLong’s test: all p < 0.05). In addition, the AUPRC, sensitivity, specificity, PPV, NPV and accuracy of the DLNMS for predicting occult N1 and N2 metastasis were 0.853, 0.700, 0.905 0.483, 0.960 and 0.882, and 0.849, 0.857, 0.813, 0.333, 0.981, and 0.817, respectively.
In the prospective cohort (Fig. 3E, F), the DLNMS achieved AUROCs of 0.914 (95% CI: [0.877, 0.949]) and 0.919 (95% CI: [0.886, 0.942]) in discriminating occult N1 and N2 involvements, and were evidently better than the PET model (0.796, 95% CI: [0.751, 0.841] and 0.712, 95% CI: [0.656, 0.768]), the CT model (0.828, 95% CI: [0.777, 0.879] and 0.835, 95% CI: [0.779, 0.891]), the clinical model (0.749, 95% CI: [0.708, 0.791] and 0.675, 95% CI: [0.629, 0.721]), the senior physicians (0.672, 95% CI: [0.623, 0.722] and 0.670, 95% CI: [0.613, 0.723]), and the junior physicians (0.645, 95% CI: [0.596, 0.693] and 0.635, 95% CI: [0.580, 0.691]) (DeLong’s test: all p < 0.05). Additionally, the AUPRC, sensitivity, specificity, PPV, NPV and accuracy of the DLNMS for occult N1 and N2 prediction were 0.871, 0.793, 0.926 0.586, 0.971 and 0.911, and 0.863, 0.833, 0.828, 0.308, 0.982, and 0.829, respectively.
In subgroup analyses regarding pathological types for patients in the validation set, external cohort and prospective cohort, the DLNMS achieved AUROCs of 0.916 (95% CI: [0.885, 0.947]) and 0.934 (95% CI: [0.915, 0.953]) in adenocarcinoma population for occult N1 and N2 prediction, respectively. Additionally, for squamous cell carcinoma population, the DLNMS yielded AUROCs of 0.904 (95% CI: [0.842, 0.966]) and 0.858 (95% CI: [0.779, 0.937]) for occult N1 and N2 prediction, respectively (Fig. 3G, H).
For patients in the validation set, external cohort and prospective cohort, the DLNMS could correct 38.30% occult N1, 73.11% benign N1, 78.13% occult N2, and 53.04% benign N2 diseases in those incorrectly diagnosed by the PET model (Supplementary Fig. 3A & B). Similarly, for those incorrectly predicted by the CT model, the DLNMS could correct 35.42% occult N1, 67.06% benign N1, 93.80% occult N2, and 41.18% benign N2 diseases (Supplementary Fig. 3C, D).
The calibration curves revealed that the DLNMS yielded good performances (Supplementary Fig. 4). Furthermore, we evaluated the clinical usefulness of the DLNMS compared to single-modal models for ONM detection via decision curve analyses, indicating that the DLNMS achieved better net benefits than other models no matter for occult N1 or N2 prediction (Supplementary Fig. 5). As summarized in Supplementary Table 4, the positive values of integrated discrimination improvements (all adjusted p < 0.05) and net reclassification index (all adjusted p < 0.05) for occult N1 and N2 predictions could be achieved when comparing the DLNMS to single-modal models.
Decision support for nodal biopsy
For 366 patients receiving nodal biopsy (Supplementary Table 5), the DLNMS yielded an AUROC of 0.853 (95% CI: [0.812, 0.895]) for predicting occult N2 diseases, which was significantly better than the PET model (0.644, 95% CI: [0.573, 0.715]), the CT model (0.780, 95% CI: [0.718, 0.841]), the clinical model (0.543, 95% CI: [0.471, 0.715]), the senior physicians (0.621, 95% CI: [0.554, 0.688]), and the junior physicians (0.525, 95% CI: [0.457, 0.594]). The AUPRC, sensitivity, specificity, PPV, NPV and accuracy of the DLNMS were 0.857, 0.919, 0.699, 0.436, 0.971 and 0.743, respectively (Fig. 4A & Table 3). In addition, with an increase in the DLNMS scores, more patients with occult N2 tumors were observed in the nodal biopsy cohort (Fig. 4B). Moreover, the DLNMS could correct 79.13% occult N2 and 56.41% benign N2 diseases in patients incorrectly diagnosed by the PET model (Fig. 4C). Similarly, for those incorrectly predicted by the CT model, the DLNMS could correct 100% occult N2 and 41.50% benign N2 diseases (Fig. 4D).
A ROC curves and performance metrics of models to predict occult N2 diagnosed by nodal biopsy. B Scatter graphs illustrating the DLNMS score distributions. C, D Scatter graphs describing the DLNMS correct cases falsely predicted by the PET and CT models. n = 366 biologically independent samples were examined. p values from Delong’s tests were adjusted by the Benjamini and Hochberg corrections for 5 multiple comparisons. Source data are provided as a Source Data file. ROC, Receiver operating characteristic curve; DLNMS, deep learning nodal metastasis signature; PPV, positive predictive value; NPV, negative predictive value; PET, positron emission tomography; CT, computed tomography.
Decision support for surgical treatment
Survival analyses revealed that both N1 and N2 cutoff values could significantly stratify the prognosis of patients in the validation set and external cohort (Supplementary Fig. 6). In addition, patients with clinical stage I NSCLC (including patients receiving LND) were divided into low-risk (N1 score <0.362 and N2 score <0.356) and high-risk (N1 score > 0.362 or N2 score > 0.356) groups. The baseline characteristics of 654 clinical stage I patients receiving LND are provided in Supplementary Table 6. As illustrated in Fig. 5, for the low-risk population (Fig. 5A-D), sublobectomy did not compromise oncological results to lobectomy (3-year overall survival [OS]: 98.1% versus 97.4%, p = 0.458; 3-year recurrence-free survival [RFS]: 90.0% versus 90.6%, p = 0.749), and LND could achieve similar survival outcomes to SND (3-year OS: 98.1% versus 97.3%, p = 0.428; 3-year RFS: 90.4% versus 93.0%, p = 0.965). In contrast, for the high-risk population (Fig. 5E–H), patients receiving lobectomy yielded improved prognosis compared to those with sublobectomy (3-year OS: 90.9% versus 80.9%, p = 0.011; 3-year RFS: 79.0% versus 59.0%, p < 0.001) and SND conferred superior prognosis to LND (3-year OS: 91.7% versus 81.7%, p = 0.008; 3-year RFS: 79.2% versus 62.8%, p = 0.001).
Survival comparisons between (A, B) sublobectomy versus lobectomy and (C, D) LND versus SND in low-risk patients. Survival comparisons between (E, F) sublobectomy versus lobectomy and (G, H) LND versus SND in high-risk patients. n = 1324 biologically independent samples were examined. Survival data were compared by the log-rank test. Source data are provided as a Source Data file. SND, systematic nodal dissection; LND, limited nodal dissection; OS, overall survival; RFS, recurrence-free survival.
Decision support for adjuvant therapy
As illustrated in Fig. 6, for patients diagnosed as pathological stage I NSCLC (including patients receiving LND), those without postoperative adjuvant therapy achieved comparable prognosis to those with postoperative adjuvant therapy in the low-risk group (3-year OS: 98.0% versus 97.5%, p = 0.581; 3-year RFS: 91.3% versus 89.3%, p = 0.323) (Fig. 6A & B). Conversely, in the high-risk group (Fig. 6C & D), patients receiving postoperative adjuvant therapy conferred significantly superior oncological results than those without postoperative adjuvant therapy (3-year OS: 95.9% versus 86.2%, p = 0.034; 3-year RFS: 90.5% versus 76.1%, p = 0.012).
Survival comparisons between with adjuvant therapy versus without adjuvant therapy in (A) and (B) low-risk and (C) and (D) high-risk patients. n = 1182 biologically independent samples were examined. Survival data were compared by the log-rank test. Source data are provided as a Source Data file. POAT, postoperative adjuvant therapy; OS, overall survival; RFS, recurrence-free survival.
Biologic basis of DLNMS
Both higher N1 and N2 scores were significantly related to the presence of aggressive histologic patterns including lymphovascular invasion (LVI), visceral pleural invasion (VPI), tumor spread through air space (STAS), micropapillary component, and solid component (all p < 0.001) (Fig. 7A, B). In addition, among patients with available data for common gene alternations, patients with high N1 scores were significantly relevant to the higher frequency of BRAF mutation (p < 0.001) and larger proportion of AKL mutation (p = 0.004) (Fig. 7C). Patients with high N2 scores yielded a significantly lower mutation rate of EGFR (p < 0.001) (Fig. 7D). In the gene set enrichment analysis (GSEA) and single sample gene set enrichment analysis (ssGSEA) analysis (Fig. 7E–G), pathways related to tumors proliferation such as signaling by GPCR, NTRKs and WNT in cancer were significantly unregulated in patients with high N1 and N2 scores. Finally, in the analyses of the tumor microenvironments, tumors with high N1 scores showed more infiltrations of central memory CD4 T cells, mast cells and plasmacytoid dendritic cells. High N2 scores were significantly associated with greater proportions of central memory CD4 T cells and central memory CD8 T cells (Fig. 7H).
A, B Radar charts illustrating histologic patterns between low-score and high-score patients. C, D Bar charts showing frequency of gene alternations between patients with low scores and high scores. E, F Dot plots showing the top 20 upregulated molecular pathways in patients with high scores, p values were adjusted by the Benjamini and Hochberg corrections. G, H Boxplots comparing proportions of infiltrated immune cells between low-score and high-score patients. The centre of box denotes the 50th percentile, the bounds of box contain the 25th to 75th percentiles, the whiskers mark the maximum and minimum values, values beyond these upper and lower whiskers are considered outliers and marked with dots. n = 144 biologically independent samples were examined. Source data are provided as a Source Data file. DLNMS, deep learning nodal metastasis signature; LVI, lymphovascular invasion; VPI, visceral pleural invasion; STAS, tumor spread through air space; NES, normal enrichment score; EGFR, epidermal growth factor receptor; KRAS, kirsten ratsarcoma viral oncogene homolog; BRAF, v-raf murine sarcoma viral oncogene homolog B1; ALK, anaplastic lymphoma kinase; ROS1, c-ros oncogene 1; MDSC, myeloid-derived suppressor cells.