Patients who received surgical procedures in Sichuan Provincial People’s Hospital from October 2021 to March 2022 were included in this study and were used for the modeling. To externally validate the predictive model, we retrospectively collected data related to patients who underwent surgical procedures from February to July 2022 in Chengdu First People’s Hospital. The inclusion criteria were as follows: patients (aged ≥ 18 years) who underwent general anesthesia and postoperative PCA. Exclusion criteria: patients admitted to the intensive care unit (ICU) after surgery. The Ethics Committee of Sichuan Provincial People’s Hospital (approval no. 2022-49-1) and Chengdu First People’s Hospital (approval no.2022-HXKT-011) approved this retrospective analysis of routinely collected data and waived patient consent. This study was registered at the Chinese Clinical Trial Registry (Registration number ChiCTR2200056097, principal investigator: Min Xie, http://www.chictr.org.cn/showproj.aspx?proj=151192, date of registration: February 1, 2022). Our study methods were performed in accordance with the guidelines and regulations of the clinical registry. All private personal information was protected and removed during the process of analysis and publication.
Data collection and outcome definition
Recent studies have found that some factors, including the type of surgery14,21, anesthesia drugs22,23,24,25, age18,26, perioperative fasting27, infusion volume28, anxiety29, inhalation anesthetics27, body mass index (BMI)16 and operative duration6 are related to the PONV. Therefore, we included as many variables as possible in our prediction model. Some of these variables were not present in past studies, such as the history of surgery, intraoperative urine volume and blood loss.
The clinical information of patients was retrospectively collected by the Hospital Information System (HIS) and scientific research assistants. The medical history and condition of patients were collected by surgeons and recorded in the HIS. The anesthetic protocol and postoperative analgesia formula were determined by the patient’s anesthesiologist and were not standardized. The nurses recorded the occurrence of PONV and rescue analgesics in the PACU. When the patient returned to the ward, the anesthesia nurse followed up with PONV and other side effects for 24 to 72 h after the procedure. The anesthesia nurse asked the patients questions about vomiting and nausea, such as, “Have you vomited or had dry-retching?”, “Have you experienced a feeling of nausea?” and “When did you experience PONV?”. PONV was considered to have occurred when patients had nausea, vomiting, or both. At the same time, the patient’s resting pain score and movement pain score were measured by a visual analog scale (VAS). Most PONV occurs within 24 h after surgery and decreases in degree and incidence with time30. In this study, only PONV and movement pain scores that occurred within 24 h postoperatively were recorded. All data were collected from the HIS by scientific research assistants, who were blinded to the study hypothesis.
Variables with missing data > 90%, a single category > 90%, a coefficient of variation < 0.01 were deleted31.
Data partitioning and dataset building
The patients at Sichuan Provincial People’s Hospital were divided into a training set and test set at a ratio of 8:2 and were used to train and test models respectively. Patients at Chengdu First People’s Hospital were used to detect the developed models externally.
Some of the missing clinical information, such as height, weight, history of motion sickness, and/or PONV, was collected by the research assistant on the phone; the other missing data, such as PCA regimen, were filled in using the random forest method.
To minimize the adverse impact of data imbalance on prediction performance, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) were applied. Three variable selection methods were used: (1) the Boruta screening method which is a feature selection algorithm to identify the minimal set of relevant variables; (2) the Lasso screening method which evaluates the importance of variables and output the results by introducing a penalization parameter penalizing and discarding unimportant variables; and (3) recursive feature elimination(RFE), which selects those features in a training dataset that are more or most relevant in predicting the target variable (Fig. 4)31,32.
In this process, 9 machine learning algorithms were trained for binary classification and applied to develop predictive models, including logistic regression, random forest, stochastic gradient descent (SGD), extreme Gradient Boosting (XGBoost), K-nearest neighbor (KNN), support vector classify (SVC), decision tree, category boosting (CatBoost), multilayer perceptron (MLP)31,32. The dataset of Sichuan Provincial People’s Hospital was divided into a training set and a test set at a ratio of 8:2; the training set was used to build models, and the test set was used to evaluate the predictive performance of the models. Internal validation was conducted with tenfold cross-validation in the training set (Fig. 4)31.
We used the AUC, accuracy, precision, recall rate, F1 value and area under the precision-recall curve (AUPRC) to evaluate the predictive performance of the model31. The AUCs of different models were compared, and the model with the largest AUC was selected to develop a PONV prediction system of PCA. SHAP helped to explain the contribution of variables to the model31. We applied the best model to patients in Chengdu First People’s Hospital and used the same quantitative metrics to evaluate the performance of the model (Fig. 4).
Sample size validation
To estimate the impact of sample sizes on predictive performance, 10% of the samples were randomly extracted from the training set to train the model, and the AUC was evaluated in the test set. The training samples increased from 10 to 100% in increments of 10%. The above process was repeated 100 times, and the results were plotted on a line graph31. The contribution of a sample size to improve the prediction performance of models was assessed according to the inflection point change on the line graph.
Continuous variables were described by mean and standard deviation, whereas categorical variables were expressed in terms of frequencies and percentages. Analysis of variance (ANOVA) and rank sum test were used for univariate analysis. Hypothesis testing and model building were implemented using the stats and sklearn packages in Python (V.3.8)31.