Our study comprised three independent cohorts (D1, D2, and D3) of patients with ER+ and LN− IBC. H&E-stained slides of surgically resected tumor specimen (no neoadjuvant treatment was administered) from D1, D2, and D3 were respectively digitized using a Roche Ventana iScan HT Scanner, a Philips Scanner, and a Ventana DP 200 Scanner (Hemel Hempstead, UK) at ×40 magnification (0.25 micron per pixel). In our experiments, D1 served as a training set for feature selection and model construction. D2 and D3 served as independent validation sets to evaluate the performance of the final locked-down prognostic model.
The flowchart for the inclusion and exclusion criteria for patient selection on D1, D2 and D3 are shown in Supplementary Fig. 1. A summary of clinicopathological variables of the three cohorts of ER+ and LN− breast cancers is shown in Table 1.
Training cohort D1: breast cancer patients treated between 1996 and 2018 at University Hospitals (UH) having a corresponding ODx score available were identified and retrieved by the pathologists from the hospital archive. The slides were subsequently digitized and transferred. H&E-stained tumor WSI along with clinicopathological/outcome information were available for 519 patients. Patients without any event (recurrence/metastasis/death) were only recruited in this study when at least a 5-year follow-up was available. This process resulted in n = 116 ER+ and LN− breast cancer patients (n = 22 events) in D1. This study was approved by the Institutional Review Board (IRB) at University Hospitals (IRB No 02-13-42C).
Validation cohort D2: The Eastern Cooperative Oncology Group (ECOG) 219740 is a prospective, randomized, phase III clinical trial that recruited n = 2778 patients with IBC (1 to 3 positive LN/LN− with tumor size ≥1 cm) from 1998 to 2007 to compare the patient’s outcome under two different adjuvant chemotherapy regimens (i.e., doxorubicin plus docetaxel versus doxorubicin plus cyclophosphamide; a previous study40 identified no significant difference in patient outcomes between the two treatment regimens). ECOG 2197 is deemed an ideal validation set due to the homogeneity in treatment (all the patients received adjuvant chemotherapy), which reduced the effect of treatments on patient outcomes. The access to the ECOG dataset involved a 2-year long process including a proposal review first through ECOG and subsequently through the Cancer Therapy Evaluation Program (CTEP) at the National Cancer Institute (NCI). From this superset, D2 comprises the subset of n = 121 ER+ and LN− breast cancer patients (n = 23 events), whose corresponding WSIs and clinical information were both accessible. ECOG provided us with the de-identified clinical data from the archived clinical trial along with the de-identified images. This study was approved by the IRB at University Hospitals (IRB No 02-13-42C).
Validation cohort D3: D3 comprises n = 84 ER+ and LN− Indian patients treated in 2009 and with follow-up until 2020 (n = 21 events) at Tata Memorial Center (TMC) which were identified and retrieved by the pathologist from hospital archive. The H&E stained tumor slides for individual patients were digitized in and subsequently transferred from TMC. The study was approved by Institutional Ethics Committee, TMC, IEC no. 1712.
The study conformed to Health Insurance Portability and Accountability Act (HIPAA) guidelines and was approved by the IRB at University Hospitals (IRB No 02-13-42C). The need for written consent from participants was waived due to the use of de-identified retrospective data.
The tumor region in the WSI was manually annotated or checked by a pathologist, with artifacts intentionally avoided (i.e., tissue folding, pen mark, staining artifacts, and bubbles). The slide with the largest representative tumor was selected for the subsequent analysis for the patients who have multiple slides available.
A set of 343 QH features were extracted to describe nuclear morphology, mitotic rates, and tubule formation based on the masks of nuclei, mitosis, and tubules, respectively, generated by three different deep learning models. Additional details regarding deep learning models, algorithms for feature calculation, and descriptions of features extracted are available in the Supplementary Methods. All tiles (2000 × 2000 pixels at ×40 magnification) containing tumor region as annotated by a pathologist were exhaustively extracted from the WSIs. In each individual tile, nuclear morphology, mitotic activity, and tubule formation-related characteristics were computed. The patient-level features were then calculated by aggregating (i.e., mean, median, max, sum, standard deviation, skewness, kurtosis, histogram entropy, and approximate entropy) these features across all the tiles.
Nuclear histomorphometric feature extraction: we employed a Pixel2Pixel GAN for nuclei segmentation. Following nucleus segmentation, we extracted 242 nuclear features to quantify the nuclear histomorphology of each WSI, including global graph41, shape, cell cluster graph (CCG)42, cell orientation entropy (CORE)43, and Haralick texture feature families44. Global graph and CCG feature families, respectively, describe the global and local spatial distribution of nuclei; shape features capture nuclear boundary properties such as smoothness and elongation; the CORE feature family quantitatively measures the disorder degree of nuclear orientations; Haralick texture features characterize chromosome patterns within nuclei.
Mitosis feature extraction: a CNN was trained to detect mitotic events on H&E-stained WSIs. In addition, an epithelium segmentation model was trained to identify epithelial nuclei for subsequent mitosis ratio calculation. Forty-five features were extracted from each WSI based on detected mitoses to describe the mitosis prevalence status. More specifically, these features included: (1) multiple statistical measurements of the mitotic count; (2) ratio of mitotic count to epithelial nuclei count, ratio of mitotic count to blue-ratio nuclei count, and ratio of mitotic count to nuclei count, over all of the extracted tumor tiles across the WSI; (3) the proportion of tiles presenting a specific mitotic density within the WSI; and (4) quantitative proliferation score calculated by simulating the mitosis prevalence assessment in clinical practice.
Tubule feature extraction: tubule formation represents the portion of tumor cells forming tubular glands19. We trained a U-Net to automatically segment tubules in breast cancer histopathological images. A total of 56 tubule features were extracted to measure tubule formation based on the segmented tubule masks. Those features comprised various statistical summaries of tubule ratio metrics on all the tiles across the WSI of each patient (i.e., the ratio of tubule nuclei count to the non-tubule nuclei count, the ratio of tubule nuclei count to the epithelium nuclei count, and the ratio of tubule nuclei count to the nuclei count) as well as the number of tiles falling between different tubule ratio intervals.
Feature selection and classifier construction
In total, 343 features were finally extracted (242 nuclear pleomorphism features, 45 mitotic count features, and 56 tubule formation features). A Cox proportional hazards regression model (henceforth referred to as Cox regression model)32, regularized by Least Absolute Shrinkage and Selection Operator (LASSO)45, was constructed to identify important predictors of DFS. First, to keep the balance among the three feature categories, we implemented a Cox regression model to identify the top four prognostic features associated with DFS separately on each of the three categories on training set D1. The total number of top features (n = 12) for inclusion within the model was determined as ~10% of the patient number in the training set. Following feature identification, a final LASSO regularized Cox regression model was used to compute the coefficients for each of the features; 11 features were assigned non-zero coefficients as part of inclusion within IbRiS while one feature had a zero-coefficient value.
An optimal risk score threshold (denoted as “θopt” hereafter) was identified from the training set D1 (see Supplementary Methods for details) to dichotomize the continuous risk scores into binary high/low-risk categories.
IbRiS was validated on two independent testing sets, D2 and D3. Specifically, we calculated a continuous risk score for each patient on D2 and D3 using the feature coefficients estimated from D1. We then classified the patients into a binary high (risk score >θopt) versus low (risk score ≤θopt) risk category of recurrence by applying θopt identified from D1. DFS was defined as the time from diagnosis/random treatment assignment until first recurrence (loco-regional or distant metastasis) or death, whichever occurred earlier. Patients were censored when they did not have an event at the termination of the study or were lost to follow-up at any time during the study. Kaplan–Meier (KM) survival analysis with DFS as the endpoint was performed between the IbRiS-derived high- versus low-risk categories. The rate of DFS was estimated using the KM method, and the difference of DFS was assessed using log-rank test46 between the high- and low-risk categories predicted by IbRiS on D1, D2, and D3. We also performed subgroup survival analysis respectively for high, intermediate, and low ODx risk categories (traditional recurrence score categorization was applied: low: <18, intermediate: 18–30, high: >30)9 as well as high, intermediate, and low histologic grades assigned by pathologists.
We conducted a univariate Cox proportional hazard analysis to evaluate if any of the routinely examined clinical parameters, treatments, and ODx risk categories were prognostic of DFS on D1, D2, and D3. The clinical parameters include age (≤50 years versus >50 years), race (white versus other), tumor size (<20 mm versus ≥20 mm), Progesterone Receptor (PR) status (negative versus positive), HER2 status (negative versus positive), histologic grade (Grade I versus Grade 2 versus Grade 3). Multivariable Cox analysis47 was also performed to assess the independent prognostic significance of IbRiS after accounting for the other clinicopathological variables on D1, D2, and D3.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.