The BOUNCE study took place in four European countries (Finland, Italy, Israel and Portugal) aiming to evaluate psychosocial resilience of BC patients during the first 18 months post-diagnosis as a function of psychological (trait and ongoing), sociodemographic, life-style, and medical variables (disease and treatment-related) (H2020 EU project BOUNCE GA no. 777167; for more information see https://www.bounce-project.eu/). The study enrolled 706 women between March 2018 and December 2019 according to the following criteria: (i) Inclusion: age 40–70 years, histologically confirmed BC stage I, II, or III, surgery as part of the treatment, some type of systemic therapy for BC; (ii) Exclusion: History or active severe psychiatric disorder (Major depression, bipolar disorder, psychosis), distant metastases, history or treatment of other malignancy within last 5 years, other serious concomitant diseases diagnosed within the last 12 months, major surgery for a severe disease or trauma within 4 weeks prior to study entry, or lack of complete recovery from the effects of surgery, pregnancy or breast feeding. The BOUNCE study is a longitudinal, observational study involving seven measurement waves: Baseline (taking place 2–5 weeks after surgery or biopsy and considered as Month 0 [M0], and subsequently at three-month intervals (M3, M6, M9, M12, M15, M12) with a final follow up measurement at M18. Data on each of the main outcome variables were collected at all time points. Data from the remaining time points served secondary research goals of the overall project.
The entire BOUNCE study was approved by the ethical committee of the European Institute of Oncology (Approval No R868/18—IEO 916) and the ethical committees of each participating clinical center. All participants were informed in detail regarding the aim and procedural details of the study and provided written consent. All methods were carried out in accordance with relevant guidelines and regulations.
For the current analyses we considered sociodemographic, life-style, medical variables and self-reported psychological characteristics registered at the time of BC diagnosis and, also, at the first follow up assessment, conducted 3 months after diagnosis. The decision to pool predictor data from the first three months post diagnosis was guided by the following considerations: (a) Emotional responses and awareness of emotional and behavioral adaptive processes are often not fully developed until the full scope of the illness can be appreciated by the patient, (b) This period defines a realistically short observation window to record resilience predictors in routine clinical practice, yet not too long in view of the one-year study end-point, (c) previous studies have shown that significant changes in psychological well-being typically take place later in the trajectory of illness.
Self-reported mental health status at 12 months post-diagnosis, indexed by the total score on the 14-item Hospital Anxiety and Depression Scale (HADS)16, served as the outcome variable in the current analyses (see Supplementary Information). The clinically validated cutoff score of 16/42 points in a wide range of languages was used to identify patients who reported potentially clinically significant symptoms at M0 and at M1217,18. Subsequently, patients were assigned to two classes: (a) those who reported non-clinically significant symptoms of anxiety and depression at M0 (i.e., immediately following BC diagnosis) and clinically significant symptomatology at M12 (i.e., one year post diagnosis) according to validated cutoffs on HADS total score (Deteriorated Mental Health group), and (b) those who reported mild symptomatology throughout the first year post diagnosis (Stable-Good Mental Health group). Thus, the Deteriorated Mental Health group comprised persons who scored < 16 points at M0 and ≥ 16 points at M12, whereas the Stable-Good Mental Health group comprised persons who scored < 16 points at M0, M3, M6, M9 and M12 assessment time points.
The analysis pipeline adopted to address the main and secondary objective of the study entailed preprocessing steps, feature selection, model training and testing19. Model 1 was designed to optimize prediction of one-year adverse mental health outcomes by considering all available variables collected at M0 and M3, including HADS Anxiety, HADS Depression, and Global QoL. Model 2 was designed to obtain personalized risk profiles and focus on potential modifiable factors (by omitting HADS Anxiety, HADS Depression, and Global QoL measured at M0 and M3). Feature selection, using a Random Forest algorithm, was incorporated into the ML-based pipeline alongside the classification algorithm to select only the relevant features for training and testing the final model (see Supplementary Information). The area under the Receiver Operating Characteristic curve (ROC AUC) was used to evaluate the performance of the cross-validated model on the test set by estimating the following metrics: specificity, sensitivity, accuracy, precision, F-measure, and AUC.
Data preprocessing and handling of missing data
Initially, raw data were rescaled to zero mean and unit variance and ordinal variables were recoded into dummy binary variables. Cases and variables with more than 90% of missingness were excluded from the final dataset. Remaining missing values were replaced by the global median value (supplementary analyses showed that applying multivariate imputation had negligible effect on model performance; see Supplementary Material).
Feature selection was conducted using a meta-transformer built on a Random Forest (RF) algorithm20 which assigns weights to the features and ranks them according to their relative importance. The maximum number of features to be selected by the estimator was set to the default value (i.e. the square root of the total number of features) in order to identify all important variables that contribute to the risk prediction of mental health deterioration. The feature selection scheme was incorporated into the ML-based pipeline alongside the classification algorithm to select only the relevant features for training and testing the final model.
Model training and validation
To address the rather common problem of model overfitting in machine learning applications in clinical research we adopted a cross-validation scheme with holdout data for the final model evaluation. Model overfitting occurs because a model that has less training error (i.e. misclassifications on training data) can have poor generalization (expected classification errors on new unseen data) than a model with higher training error. As a result, we took extra steps to avoid partially overlapping subsets of cases by splitting our dataset into training and testing subsets with a validation set. Hence, model testing was always performed on unseen cases which were not considered during the training phase and, consequently, did not influence the feature selection process. This procedure helps to minimize misclassifications on the training phase while also ensuring lessening of generalization errors.
In the present study, a fivefold data split for hyper-parameters (i.e. cross-validation with grid search) was applied on the training, testing and validation subsets, to prevent overfitting and maximize model generalizability performance on the test set. A grid search procedure with an inner fivefold cross-validation was applied on the validation set for hyper-parameters tuning and model selection. To this end, the best parameters from a grid of parameter values on the trained models were selected enabling the optimization of the classification results on the test set.
Classification with balanced random forest algorithm
Class imbalance handling was addressed using random under-sampling methods to balance the subsets combined inside an ensemble. Specifically, a balanced random forest classifier from the imbalanced-learn MIT-licensed library21 was applied to deal with the classification of imbalanced classes within our dataset. Balanced Random Forest22 combines the down sampling majority class technique and the ensemble learning approach, artificially adjusting the class distribution so that classes are represented equally in each tree in the forest. In this manner, each bootstrap sample contains balanced down-sampled data. Applying random-under sampling to balance the different bootstraps in an RF classifier could have classification performance superior to most of the existing conventional ML-based estimators while alleviating the problem of learning from imbalanced datasets.
The following metrics to assess the performance of the learning algorithm applied on imbalanced data: specificity (true negative rate); sensitivity (true positive rate); accuracy, precision, and F-measure. These metrics are functions of the confusion matrix given the (correct) target values and the estimated targets as returned by the classifier during the testing phase. We also used the Receiver Operating Characteristic (ROC) curve to represent the tradeoff between the false negative and false positive rates for every possible cut off. The Area Under the Curve (AUC) was also computed according to the estimated ROC analysis.
Personalized risk profiles (model 2 only)
Following the analysis steps described in the preceding paragraphs, model-agnostic analysis was implemented on the set of variables that emerged as significant features from Model 2 to identify predictor variables of primary importance for a particular mental health prediction23,24. This analysis supports the interpretability of the set of variables that emerged as significant features toward patient classifications. Specifically, model agnostic analysis can be applied: (i) at the global (variable-specific) level to help clarify how each feature contributes toward model decisions per patient group and, (ii) at the local (i.e., patient-specific) level to identify predictor variables of primary importance for a particular mental health prediction. In view of the lack of precedence in the literature we selected mathematical models that made no assumptions about data structure. The break-down plots (local level) were developed using the dalex Python package19,23 with the default values in the arguments of the main function were applied.