Monday, October 2, 2023

A spectroscopic liquid biopsy for the earlier detection of multiple cancer types – British Journal of Cancer

Organ-specific classifications

For the primary analysis, the 90% CV sensitivity-tuned classifiers for lung (93% sensitivity/78% specificity) and kidney (92% sensitivity/79% specificity) cancer show real promise for cancer-specific applications with well-balanced statistics (Supplementary Table S5). For the specificity-tuned approach, the test strategy performed well for brain cancer (74% sensitivity/91% specificity), and colorectal cancer (77% sensitivity/90% specificity).

We targeted a minimum of 45% for the CV metrics, when maximising either sensitivity or specificity. The mean ROC curves are displayed in Fig. 2, showing the sensitivities and specificities for the sensitivity-tuned () specificity-tuned (■) models in each classifier. The brain, colorectal, kidney, and lung cancer versus NCS classifications reported very promising results, with area under the curve (AUC) of 0.90 and above. Pancreatic cancer versus NCS model achieved an AUC of 0.84. The breast cancer versus NCS model achieved an AUC of 0.76, which yields a sensitivity of 88% when specificity is 43%, and a specificity of 87% when sensitivity is 47%. The ovarian and prostate cancer models performed well, and both reported an AUC of 0.86. For each of the organ-specific classifications, the predictions were examined by cancer stage. The detection rates were calculated for both the sensitivity-tuned (Supplementary Table S6) and specificity-tuned (Supplementary Table S7) models, and the results are further discussed in the Supplementary Text section. As an exploratory analysis, indicative positive predictive values (PPV) for each cancer type have been determined for both the sensitivity-tuned and specificity-tuned models (Supplementary Table S8). PPVs ranging from 3.1 to 46.5% may be achievable if applied in an organ-specific screening programme for a population with a 2% prevalence of undetected cancer. On the other hand, based on symptomatic patients referred for cancer investigation in a hospital setting, with an estimated prevalence of 7% [7], PPVs between 10.5 and 75.1% may be observed depending on cancer type and diagnostic model selection. A reliable estimate of prevalence will be available after larger-scale prospective studies. For every classification in this study, 95% confidence intervals (CI) were calculated for each selected threshold on each ROC curve (Supplementary Table S9).

Pooled cancer classification

The C versus NCA algorithm was tuned to selected thresholds that resulted in a 98% sensitivity or specificity for the CV set. The C versus NCA ROC analysis reported an AUC value of 0.94, which suggests excellent detection capability (Fig. 3a). This results in a 98% sensitivity (59% specificity) or a specificity of 99% (57% sensitivity). For the C versus NC dataset (Fig. 3b), the sensitivity-tuned model achieved 90% sensitivity and 60% specificity, and when tailored for greater specificity (95%) the sensitivity was 40%. The ROC curve generated an AUC of 0.85.

Fig. 3: Results from the cancer (C) versus asymptomatic non-cancer (NCA) classification and the C versus all non-cancer (NC) classification.

The mean receiver operating characteristic curve for a C versus NCA and b C versus NC showing the trade-off between sensitivity (Sens) and specificity (Spec), where the markers represent the sensitivity-tuned model (), and specificity-tuned model (■), and AUC denotes the area under the curve. The detection rates for the sensitivity-tuned models c, d and the specificity-tuned models e, f are illustrated for the respective classifications, split by cancer stage.

The bar graphs in Fig. 3 represent the detection rate when split by stage: (c), (d) sensitivity-tuned and (e), (f) specificity-tuned results for the C versus NCA and C versus NC classifiers, respectively. When exploring C versus NCA, the sensitivity-tuned model successfully predicted 98% of all cancers correctly. The detection rates were consistent across all stages: Stage I, 99%; II, 96%; III, 99%; IV, 99%. On the other hand, the high specificity (99%) model was still capable of detecting 64% of Stage I cancers and identified 51% of Stage II. Therefore, 55% of Stage I–II cancers were predicted correctly, highlighting the great potential for the Dxcover® Cancer Liquid Biopsy in the detection of early-stage cancers. The PPV for ‘all cancer’ was 4.3% for the sensitivity-tuned model. However, the specificity-tuned model may be better suited for a screening scenario. When screening for cancer in targeted populations a reasonable estimate of disease prevalence is around 2%, e.g., lung cancer screening programmes [23]. Therefore, with an assumed cancer prevalence of 2%, a PPV of 45% could be achieved with the specificity-tuned model.

For the overall C versus NC classification, Fig. 3d illustrates that when tuned for higher sensitivity, 92% (213/231) of Stage I and 85% (438/516) of Stage II cancers were detected. Similarly, the detection rate was extremely high for late-stage cancers—91% (375/410) and 95% (359/377) for Stage III and IV, respectively. For the model with 95% specificity, the detection rates are fairly consistent across Stages: I 39%; II 32%; III 42%; IV 49% (Fig. 3f). Patient metadata factors were explored to assess any impact on the predictions of the liquid biopsy. Patient age did not significantly affect either the sensitivity-tuned or specificity-tuned models; likewise, the detection rates for both models when split by sex did not indicate any concerns as a potential confounding factor (Supplementary Table S10).

Feature importance

When biological samples are irradiated with infrared (IR) light, stretching and bending of the bonds between chemical functional groups cause characteristic vibrations within these biomolecules [24]. A biological signature that represents the whole biochemical profile of that sample is generated, resulting in an IR spectrum. The spectral regions, or specific wavenumbers, that contribute to a classification can be assessed by feature importance analysis. The feature importance values were extracted from each classification, and Fig. 4 illustrates the wavenumber regions that were found to be the most discriminatory. The top five regions of importance are also described in Supplementary Table S11, with tentative biological assignments and their corresponding vibrational modes. The power of the technique lies in the use of the entire spectral signature, for which clear differences can be observed for the different cancers.

Fig. 4: Feature importance plots highlighting the wavenumber regions that were found to be the most discriminatory for each organ-specific classification: non-cancer symptomatic (NCS), NCS female-only (NCS-F), NCS male-only (NCS-M).
figure 4

Note: the feature importance for cancer versus non-cancer has been included only for comparison.

For C versus NC, the wavenumber region deemed to be of the highest importance was the Amide II (~1530 cm−1) band. This is one of the largest peaks in a serum spectrum, it contains information from overlapping bands associated with protein secondary structures, such as α-helices and β-sheets, thus variations in this region, as well as the Amide I region (1600–1700 cm−1), are often indicative of disease states [15]. The Amide II band is associated with N–H bending vibrations, and C–N stretching vibrations in protein molecules. In addition, N–H bending and C–N stretching vibrations (Amide III) and asymmetric P\({{{{{{\rm{O}}}}}}}_{2}^{-}\,\) stretching in phosphodiesters were found to be important (~1260 cm−1). Other significant regions were ~1025 cm−1 (C–O and C–C stretching, C–OH deformation), ~1061 cm−1 (symmetric P\({{{{{{\rm{O}}}}}}}_{2}^{-}\,\) stretching, C–O stretching), and ~3345 cm−1 (OH, C–H, N–H stretching). The brain cancer versus NCS classification reported similar importance within the Amide II and Amide A regions, as well as the Amide I band (~1607 cm−1, C = O and C–N stretching, N–H bending), but the lipid bands in the high wavenumber region were also shown to be significant in this model, which arise at ~2861 cm−1 and account for C–H and CH2 stretching vibrations. The top region for the breast cancer model was found around 2872 cm−1, accounting for C–OH deformation and C–O and C–C stretching vibrations which are related to glycogen and carbohydrates. The Amide II is deemed to be most important in the colorectal model, whereas the Amide III region (~1258 cm−1) was the highest importance for the kidney. Vibrations related to nucleic acids were also important for colorectal, and lipid vibrations were significant for the kidney cancer classifier. The lung cancer versus NCS feature importance seemed rather unique as most of the important bands appear in the lower end of the spectrum, mainly associated with symmetric (1074 cm−1) and asymmetric (1167 cm−1) P\({{{{{{\rm{O}}}}}}}_{2}^{-}\,\) stretching vibrations, as well as lipidic C–H and CH2 stretching (2750 cm−1). Importance values for the ovarian and pancreatic cancer classifiers were also quite similar, as the top four wavenumbers regions are made up from proteinaceous regions. Lastly, the prostate versus NCS-M model were mainly associated with protein (Amide II/A) and lipid vibrations arising around ~1357 cm−1 and ~2947 cm−1.

Source link

Related Articles

Leave a Reply

Stay Connected

- Advertisement -spot_img

Latest Articles

%d bloggers like this: