Patient characteristics
The study population is derived from four centers and one database (Fig. 1a, Supplementary Table 1, n = 906). 192 (21.2%) patients were female and 714 (78.8%) were male, the mean age was 59.4 years (interquartile range, IQR = 12.9 years). There were 477 (52.7%) patients with stage I and II disease, compared to 396 with stage III-IV disease (43.7%) and 33 (3.6%) with incomplete data on disease stage (UICC 8th). 364 patients (40.2%) received adjuvant radio-/chemotherapy and surgical treatment (S(C)RT), compared to 348 (38.4%) receiving definitive radio-/chemotherapy ((C)RT), as well as 129 (14.2%) patients receiving surgery alone (ST). There were 65 patients with incomplete treatment data (7.2%). All patients were analyzed for HPV-status, using either p16, HPV high-risk DNA or both (referred to as dichotomous p16/HPV-DNA testing). In total, there were 798 patients where dichotomous HPV-status was available (88.1%). Overall, there were 378 HPV-positive cases (41.7%), and 528 HPV-negative cases (58.3%).
a Patients from Cologne (C, GER), Giessen (G, GER), Maastricht (M, NL), Heidelberg (H, GER) and TCGA (T, USA; database) were included in the study (n = 906). HPV association was defined as either dichotomous HPV-DNA and p16 IHC status if both markers were available or by p16/detection of high-risk HPV-DNA. All patients received standard treatment of care. b CONsolidated Standards Of Reporting Trials (CONSORT)-like flow chart representing the study population of the training/validation cohort of 267 patients. Cases that could not be identified as OPSCC or with missing information on HPV-status were excluded. There were four cases where follow-up data were not available in the training cohort, which were used for training the model for HPV association, but not for survival analysis. (c) CONSORT-like flow chart representing the study population of the external test set of 639 patients. . Both primary tumors and lymph node metastases were included in the test set.
Building a deep learning model to predict HPV association and evaluating the performance
In continuation of our previous methodology for predicting the association of Human Papillomavirus (HPV) using histological samples17, we adopted a comparable approach. Specifically, we constructed a Feature Pyramid Network (FPN) with a ResNet-18 encoder to perform semantic segmentation of viable tumor regions. Subsequently, we classified the extracted tumor tiles based on their HPV association using an additional ResNet-18 network. To be data-efficient, we used a training dataset of 267 patients from two centers and one database (Fig. 1b).
The algorithm achieved an overall performance of area under the receiver operator curve (AUROC) of 0.83 (95% CI = 0.77–0.9; n = 639) for predicting HPV association on samples that were completely independent of the training dataset (Fig. 1c, test set). The sensitivity was 0.78 (95% CI = 0.67–0.89) and the specificity was 0.8 (95% CI = 0.62–0.99) with an accuracy of 0.77 (95% CI = 0.69–0.85). The performance on the training dataset was AUROC = 0.93 (sensitivity = 0.9, specificity = 0.83 and accuracy = 0.86; n = 267) (Supplementary Table 2, Fig. 2b).
a Pie charts for patient characteristics for the training cohort (left panel) and test cohort (right panel). The number besides the panel indicates the number of patients originating from each center. b Area under the receiver operator curve (AUROC) for six different patient populations. Cologne (n = 177), Giessen (n = 240), Maastricht (n = 88), Heidelberg (n = 31), and lymph node metastasis (n = 103) which originated from the Giessen site and were independent to the training cohort. c Overview of the study population after applying a threshold of cases with a variance below \(7x{10}^{-2}\). d Area under the receiver operator curve (AUROC) for cases that were filtered by the threshold of variance. e Visualization of the Cox proportional hazards model for tile class prevalence of the HPV-positive class. The red line indicates the smoothened function, the vertical lines at the horizontal axis indicates individual patients with a given risk. The green bar indicates the error of the function. f Kaplan–Meier curve of n = 258 patients stratified for tile class prevalence of the HPV-positive class. A Cox proportional hazards model was used to compare survival curves.
Next, we evaluated whether we could increase the performance of the model to maximize its sensitivity by applying a fixed threshold on the variance of the probability of the HPV-positive class (Supplementary Figs. 2a, b; graphically explained). We therefore randomly sampled three times 30 cases out of a collection of 536 cases of primary tumors, originating from the test set (Fig. 1c; sampling), to avoid overfitting our data. These cases were filtered by a given variance of the probability of the HPV-positive class (Supplementary Fig. 2d). By filtering cases with a threshold below \(7\,{\rm{x}}\,{10}^{-2}\), the best tradeoff between sensitivity (0.89 ± 0.04), specificity (0.8 ± 0.06), accuracy (0.83 ± 0.05) and the AUROC (0.86 ± 0.04) could be observed (Supplementary Table 3, Supplementary Fig. 2d). By applying this fixed threshold, we extracted a total of 258 cases (48.1%) from a collection of 536 patients (Fig. 2c). Here, an AUROC of 0.88 could be achieved (Fig. 2D; sensitivity = 0.85, specificity = 0.84 and accuracy = 0.85, n = 258).
Prognostic relevance of determining HPV association using deep learning and a binary classification
We then investigated the prognostic role of predicting HPV association within this cohort and compared it to regular testing of HPV status. By setting a threshold of 50% of the tile class prevalence (Supplementary Figs. 2a, b), and mimicking the binary classification of HPV-positive/negative, a five-year overall survival rate of 80% [95% CI = 71–90%] for patients classified as HPV-positive was observed (Supplementary Fig. 3a), which was similar to the five-year overall survival rate of standard HPV testing of 80% [95% CI = 71–90%] (Supplementary Fig. 3b). Misclassification of the HPV-status was not to the disadvantage of patients when stratified for survival (Supplementary Figs. 3c, d).
Granular patient prognostication by predicting HPV association
To resolve the prognostic value of predicting HPV association, we used a Cox proportional hazards model. Here, the tile-class prevalence as a measure of HPV association (Supplementary Figs. 2a, b) inversely correlated with survival (Likelihood-ratio test (LRT) = 49.23, p < 0.001, employing a chi-square distribution; concordance index (c-index) = 0.71; Fig. 2e). Having shown that the prediction of HPV association strongly correlated to survival, we applied a three-tier threshold (high, referred to highly HPV-associated; low and intermediate above/below 20% and in between). By following this, we could identify patients with a five-year survival rate of 96% [95% CI = 90-100] (Fig. 2f, high) outperforming the previously mentioned 80% five-year survival rate of the gold standard of HPV testing [95% CI = 71-90%]. For patients classified as intermediate, the five-year survival rate was 54% [95% CI = 45–65%], and for patients classified as low 34% [95% CI = 23–52%]. A multivariate analysis including several clinical variables (therapy, tumor size, nodal status, smoking history, and sex) confirmed the prognostic superiority of predicting HPV association using OPSCCnet (high: HR = 0.15 [95% CI = 0.05–0.44], p < 0.001, Cox proportional hazards model; intermediate: HR = 0.43 [95% CI = 0.26–0.7], p = 0.043, Cox proportional hazards model; low: reference; n = 211; Supplementary Fig. 4a).
The gold standard of HPV testing showed less prognostic relevance of HPV testing when compared to the algorithm (LRT = 39.72, p < 0.0001, employing a chi-square distribution; c-index = 0.65), which was underlined by the results of a multivariate analysis (HPV testing: HPV-positive HR = 0.29 [95% CI = 0.15–0.54], p < 0.001, Cox proportional hazards model; n = 211; Supplementary Fig. 4b).
Developing a combined scoring of the variance of the tile class probability and the tile class prevalence for patient prognostication
By using the variance of the tile class probability (Supplementary Fig. 2a, b) we could increase the sensitivity and specificity of determining HPV association in about 50% of the study population in the external test set (n = 258), outperforming regular HPV-status in selecting patients with an improved prognosis. We therefore reasoned to investigate the prognostic relevance of combing variance of the tile class probability and the tile class prevalence together (Supplementary Figs. 2a, b), leading to a combined HPV association score (referred to as combined score; Supplementary Fig. 2c; Eq. (3)).
For the external test set, filtered for primary tumors, there was a strong association between the combined score and overall survival, by again using a Cox proportional hazards model (LRT = 62.26, p < 0.001, employing a chi-square distribution, n = 531; Fig. 3a). We then divided the patients into three separate groups (high HR = 0.17 [95% CI = 0.10–0.29]; intermediate HR = 0.49 [95% CI = 0.37–0.66]; low: reference) with distinct five-year survival rates (high: 85% [95% CI = 77–93%], intermediate: 56% [95% CI = 50–63%], low: 34% [95% CI = 25–45%]; Fig. 3b). A similar association, given the overall lower five-year overall survival rate of the training cohort, was observed in the training cohort (LRT = 24.86, p < 0.001, employing a chi-square distribution, n = 263; Fig. 3c). The five-year survival rates were 71% [95% CI = 56–90%] for the high group, 45% [95% CI = 36–56%] for the intermediate group and 31% [95% CI = 20–49] for the low group (high: HR = 0.24 [95% CI = 0.12–0.48]; intermediate HR = 0.68 [95% CI = 0.45–1]; low: reference; Fig. 3d).
a Hazard ratio plot of the predicted HPV association and overall survival for patients from the test cohort: Cologne (C; n = 177), Giessen (G; n = 240), Maastricht (M; n = 88) and Heidelberg (H; n = 31). b Corresponding Kaplan–Meier curve for the same population as depicted in A (n = 531). The cohort is divided into three separate groups by the combined score. A Cox proportional hazards model was used to compare survival curves. c Hazard ratio plot of the predicted HPV association and overall survival for patients from the training cohort: Cologne (C; n = 233), Giessen (G; n = 13), and TCGA (T; n = 21), with a total of n = 263 patients. d Corresponding Kaplan Meier curve for the same population as depicted in C (n = 263). The cohort is divided into three separate groups by the combined score. A Cox proportional hazards model was used to compare survival curves. e Hazard ratio plot of the predicted HPV association and overall survival for patients from the test cohort, exclusively containing lymph node metastatic samples: Giessen (G; n = 102). f Corresponding Kaplan–Meier curve for the same population as depicted in E (n = 102). The cohort is divided into three separate groups by the combined score. A Cox proportional hazards model was used to compare survival curves. g Histological image of three cases with corresponding classes of the combined score. The scale bars have a length of 0.052 mm.
In addition, the generalizability of the prognostic relevance was confirmed by analyzing tissue of lymph node metastases, although the intermediate group did not reach a level of significance (LRT: 13.39, p = 0.004, employing a chi-square distribution, n = 102, Fig. 3e) high HR = 0.28 [95% CI = 0.12–0.67]; intermediate HR = 0.65 [95% CI = 0.37–1.14], p = 0.12, employing a chi-square distribution; low: reference (Fig. 3f).
Prognostication of patient subpopulations using the combined score of HPV association
Having shown that predicting HPV association allowed accurate stratification of OPSCC patients, we next evaluated the prognostic relevance of the combined score in a subgroup of patients with early-stage disease (stage I/II, UICC 8th, n = 294; Fig. 4a). Particularly this group of patients may qualify for potential treatment de-escalation strategies. Again, there was a strong inverse correlation of the combined score and survival (LRT = 29.3, p < 0.001, employing a chi-square distribution, n = 294; Fig. 4b).
a Patients with early-stage disease I/II (Union for International Cancer Control; UICC 8th) from different centers of the test cohort are included for the subsequent analysis. b Hazard ratio plot of the selected patients with stage I/II disease. c Kaplan–Meier curve for the survival benefit of stage I/II patients according to the combined score. A Cox proportional hazards model was used to compare survival curves. d Schematic, illustrating that for comparison HPV testing, with both HPV-DNA assessment and p16 status (dichotomous testing) was used. e Kaplan–Meier curve for regular HPV testing. A Cox proportional hazards model was used to compare survival curves. f Schematic illustration of the filtering criteria for the subsequent analysis. Cases from Cologne, Giessen, Maastricht, and Heidelberg that were tested as HPV-positive (n = 232). g Hazard ratio plot of the selected patients with stage a positive HPV-status (n = 232). h Kaplan–Meier curve for the combined score of patients tested as HPV-positive (n = 232). A Cox proportional hazards model was used to compare survival curves.
Patients with a high score had a HR = 0.15 [95% CI = 0.06–0.38], p < 0.001, derived by a Cox proportional hazards model, compared to patients in the intermediate group with a HR of 0.46 [95% CI = 0.29–0.73], p < 0.001, Cox proportional hazards model (Fig. 4c). The five-year survival rates were 90% [95% CI = 82–99%] for patients classified as high, 73% [95% CI = 65–82%] for patients classified as intermediate and 48% [95% CI = 35–65%] classified as low. A multivariate analysis showed that the combined score was an independent predictor among several variables (high, HR = 0.22 [95% CI = 0.03–0.59], p = 0.003, Cox proportional hazards model; intermediate, HR = 0.53 [95% CI = 0.30–95], p = 0.033, Cox proportional hazards model, n = 253; Supplementary Fig. 5a). Within the same patient population, the performance of the gold standard of HPV-DNA/p16 combination (Fig. 4d) achieved a HR = 0.31 [95% CI = 0.2–0.48], Fig. 4e, with a five-year survival rate of 83% [95% CI = 78–90] for HPV-positive cases and 48% [95% CI = 37–61] for HPV-negative cases. The LRT was 28.1, p < 0.001, employing a chi-square distribution, and a multivariate analysis revealed a HR of 0.26 for HPV-positive cases [95% CI = 0.15–0.45], p < 0.001, Cox proportional hazards model, n = 253 (Supplementary Fig. 5b).
Within the training cohort, adjusted for cases with stage I/II disease (n = 151), the combined score could stratify patients with a HR = 0.13 [95% CI = 0.04–0.5], p = 0.002, (Cox proportional hazards model) in the high group and HR = 0.54 for intermediate group [95% CI = 0.29–1.1]; p = 0.072, Cox proportional hazards model (Supplementary Fig. 6a). The five-year survival rates were 78% [95% CI = 58–100%] for patients classified as high, and 65% [95% CI = 53–79%] for patients classified as intermediate. For the low group, the five-year survival rate was 48% [95% CI = 31–75%]. A multivariate analysis yielded a HR of 0.10 [95% CI = 0.02–0.50], p = 0.005, derived by a Cox proportional hazards model for the high group and HR = 0.43 [95% CI = 0.16–1.14], p = 0.091, derived by a Cox proportional hazards model for the intermediate group (Supplementary Fig. 6b; n = 136). The HR of HPV testing was 0.38 [95% CI = 0.15–0.95], p < 0.001, derived by a Cox proportional hazards model for HPV-positive cases Supplementary Fig. 6c. A multivariate analysis of regular HPV testing showed a HR for HPV-positive cases of 0.22 [95% CI = 0.09–0.53], p < 0.001, Cox proportional hazards model, n = 136 (Supplementary Fig. 6d).
To draw a more complete picture of the performance to stratify patients using the combined score, we next selected patients of the test set for advanced stage disease (stage III/IV, UICC8th; Supplementary Fig. 7a; combined score, LRT = 9.5, p = 0.002, employing a chi-square distribution, n = 219, c-index = 0.57; Supplementary Fig. 7b). Patients with a high combined score had a HR of 0.48 [95% CI = 0.27–0.86], p = 0.013, n = 219 (Supplementary Fig. 7c), while a level of significance could not be reached for the intermediate group (HR = 0.81, [95% CI = 0.54–1.21], p = 0.3). For regular HPV testing a level of significance could not be reached (HPV-positive, LRT = 2.77, p = 0.1, employing a chi-square distribution, c-index = 0.51, HR = 0.53 [95% CI = 0.23–1.2], p = 0.12, Cox proportional hazards model; Supplementary Fig. 7d). A multivariate analysis showed a significant prognostic association of the high combined score, HR = 0.42 [95% CI = 0.22–0.77], p = 0.006, derived by a Cox proportional hazards model and no significant association for the intermediate group HR = 0.81 [95% CI = 0.52–1.26], p = 0.35, Cox proportional hazards model (Supplementary Fig. 7e; n = 195).
In addition, we selected only HPV-positive cases (Fig. 4f) and applied the combined score to stratify these patients. Here, the combined score showed an LRT = 9.3 (p = 0.002, employing a chi-square distribution, n = 230, c-index = 0.63; Fig. 4g). By again dividing patients using the same three-tier threshold, patients with a high score had a HR = 0.18 [95% CI = 0.06–0.56], p = 0.003, derived by a Cox proportional hazards model and patients with an intermediate score HR = 0.46 [95% CI = 0.24–0.90], p = 0.022, Cox proportional hazards model (Fig. 4h). This resulted in a five-year overall survival rate of 92% [95% CI = 84–100%] for patients within the high group, 83% [95% CI = 76–90%] for patients in the intermediate group and 63% [95% CI = 47–84%] for the low group. Consistently, by following a multivariate approach, the analysis showed a significant prognostic association of the high combined score with HR = 0.21 [95% CI = 0.05–0.87], p = 0.032, derived by a Cox proportional hazards model, and no significant association for the intermediate group HR = 0.54 [95% CI = 0.24–1.24], p = 0.128, Cox proportional hazards model (Supplementary Fig. 7e; n = 196).
To this end, we explored visual correlates of misclassified cases and provided their clinical characteristics. We chose four cases, namely with the highest tile-class prevalence, but which were falsely classified as HPV-positive (false-positive) or that were falsely classified as HPV-negative (false-negative). Interestingly, two cases being falsely classified as HPV-positive were found to be p16 positive but HPV-DNA negative (Supplementary Fig. 9a) with an overall survival of 6.6 and 12.3 years, respectively. Two cases that had been falsely classified as HPV-negative were both found to be HPV-DNA/p16 positive but had an overall survival of 0.16 years and 0.42 years, respectively (Supplementary Fig. 9b).