Data collection, study population, variables, and outcomes
We used the TriNetX platform to access aggregated, de-identified electronic health records (EHR) of over 90 million patients from 56 HCOs across all 50 American states, covering diverse geographic, age, race, and ethnic groups (United States Collaborative Network)5. The MetroHealth System, Cleveland Ohio, Institutional Review Board (IRB) has determined that research using the de-identified and aggregated data from TriNetX as described in this study is not Human Subject Research and therefore IRB review was not required. We have previously used the TriNetX platform to study risk factors and outcomes of COVID-19 infection and vaccination14,15,16.
TriNetX data are collected from participating HCOs, primarily from EHR systems comprised of structured demographics, diagnoses, procedures, and medications but also from facts extracted from clinical documents using natural language processing17. TriNetX completes intensive data preprocessing to minimize missing values. The platform also maps data to a clinical model with consistent semantic meanings so that the data can be queried consistently regardless of the underlying data source. All covariates are either binary, categorical, or continuous. Missing sex values are represented as “Unknown Sex,” while missing data for race and ethnicity are represented as “Unknown Race” and “Unknown Ethnicity,” respectively. The data available in TriNetX go back decades, depending on the HCO, and are frequently updated (80% of data providers update their data in 1, 2, or 4-week intervals)18. For this study, the EHR data were queried and analyzed on October 8, 2023.
The primary analysis compared the hazard of IS in patients aged 65 years and over after Pfizer bivalent booster versus monovalent booster; the secondary analysis compared the hazard of IS in patients aged 65 years and over after Pfizer bivalent booster versus Moderna bivalent booster (Fig. 2). The exposure of interest was vaccination by either the Pfizer bivalent booster (“Pfizer bivalent” cohort), Moderna bivalent booster (“Moderna bivalent” cohort), or Pfizer/Moderna monovalent booster (“monovalent” cohort) prior to August 27, 2023, to ensure sufficient time for follow-up at 21 and 42 days (Fig. 2). Patients in the monovalent cohort were included beginning in August 2021, while those in the Pfizer and Moderna bivalent cohorts were included beginning in September 2022, as these time periods represent when the cohorts began receiving booster vaccines in TriNetX. Cohorts were matched by demographics (age, sex, race, ethnicity), COVID-19 infection, medical conditions that are risk factors for both IS and severe COVID-19 infection19,20, and adverse socioeconomic determinants of health (Table 1). Self-reported race and ethnicity data in TriNetX come from the clinical EHR systems of the contributing HCOs. TriNetX maps race and ethnicity data from its contributing HCOs to the following categories: (1) Race: Asian, American Indian or Alaskan Native, Black or African American, Native Hawaiian or Other, White, Unknown Race; and (2) Ethnicity: Hispanic or Latino, Not Hispanic or Latino, Unknown Ethnicity. The outcome of interest was an encounter diagnosis for IS in TriNetX at either 1–21 days or 22–42 days after booster administration (Fig. 2). Details of clinical codes for covariates, exposures, and outcomes are described in Supplementary Table 2.
To compare the hazard of IS between the Pfizer bivalent and monovalent cohorts, as well as the Pfizer bivalent and Moderna bivalent cohorts, the cohorts were propensity-score matched (1:1 matching by nearest neighbor greedy matching algorithm with a caliper of 0.25 standard deviations) for the variables enumerated above. Kaplan–Meier survival analysis was used to estimate the probability of IS at 1–21 days or 22–42 days after booster administration. The Kaplan–Meier analysis estimates the probability of an outcome at a respective time interval (daily time interval in this analysis). To account for the patients who exited the cohort during the analysis period, and therefore should not be included in the analysis, censoring was applied. Patients are censored when the last data point in the patient’s record is within the time interval of interest, or if the outcome of interest occurs after the index event but before the start of the time window21. The Cox proportional hazard assumption was tested using Schoenfeld residuals22. The TriNetX platform calculates HR and associated 95% CI using the R survival package v3.2-3. For generating HR, TriNetX sets robust=FALSE using the R survival package, which is a limitation of the TriNetX platform since it does not consider potential clustering of patients within HCOs or specific geolocations. All statistical tests were conducted in October 2023 within the TriNetX Analytics platform with significance set at p-value < 0.05 (two-sided). A sub-analysis was conducted to compare the hazard of first-time IS between the Pfizer bivalent cohort and monovalent cohort, but not between the Pfizer bivalent cohort and Moderna bivalent cohort due to limited sample size.