Thursday, December 7, 2023

Forecasting local hospital bed demand for COVID-19 using on-request simulations – Scientific Reports

General model structure

The complete forecast model consists of 5 main pieces of code, referred to as modules: (1) Data loading and nowcasting, (2) Reproduction number estimates, (3) Vaccination coverage forecasting, (4) Incidence model, and (5) Care path model. The calculations done in modules 2 and 3 rely only on the loaded data and user input, while modules 4 and 5 also build further on the results of each previous module (see Fig. 1). This hierarchical structure implies that any change to parameters in a module updates the outputs in all dependent modules. However, by guiding the user through the modules in sequential order, the number of required computations can be drastically reduced.

Figure 1

Model structure showing each of the modules, the data carried over between modules, and the possible user input in each module.

Data structure and definitions

The model needs data on infection incidence, vaccine doses, bed occupancy in general wards and intensive care units, and a simulation window for which a forecast is required. Although we developed this model for hospitals in Germany, based on data provided by the Robert Koch Institute11 and the Deutsche Interdisziplinäre Vereinigung für Intensiv- und Notfallmedizin (DIVI)12, the model can be applied to any country that has the following data available (preferably if in remote-accessible, machine-readable format):

  1. 1.

    Incidence, \(I_{g}(t)\)

  2. 2.

    Administered vaccinations, \(V_{d,g}(t)\)

  3. 3.

    Bed occupancy on general wards (GW) \(B_{GW,g}(t)\) and ICU \(B_{ICU,g}(t),\)

where t denotes the time in days since a reference date before the start of the pandemic, g denotes a geographical subdivision (in the case of Germany, this is a district, i.e., Landkreis or Stadtkreis), and d denotes the dose of vaccine administered (1st, 2nd, or 3rd/booster). Incidence and vaccination data are needed for each day of the period of reference, while merely the value at the start date of the forecast is needed for bed occupancy.

The user defines the catchment area of interest, selecting geographical areas from the available list (\(K_{all}\)), into the catchment set (\(K_u\)). After this, the input data is summed over the selected areas:

$$\begin{aligned} I(t)= & {} \sum _{g\in K_u} I_g(t), \end{aligned}$$


$$\begin{aligned} V_d(t)= & {} \sum _{g\in K_u} V_{d,g}(t), \end{aligned}$$


$$\begin{aligned} B_{GW}(t)= & {} \sum _{g\in K_u} B_{GW,g}(t), \end{aligned}$$


$$\begin{aligned} B_{ICU}(t)= & {} \sum _{g\in K_u} B_{ICU,g}(t). \end{aligned}$$


The start of the simulation is denoted as \(T_S\), for which the incidence, number of vaccinations, and bed occupancy are taken from the (last data-point of the) data, the first forecasted day is thus \(T_S+1\). The length of the forecast is defined as L; the last forecasted date is therefore given as \(T_E = T_S+L\). The size of any class of individuals in the model is given in absolute numbers of individuals; thus \(S(0) = N\), with N being the population size.

Because many of the parameters in the dashboard’s models are continuously distributed (as Exponential, Gamma, or Weibull), and the model uses discrete time steps, we need to discretise the parameter distributions. For each continuous distribution, with probability density function (PDF) \(f_c(t)\), the discretised PDF is defined as \(f_d(t) = \int _t^{t+1}f_c(x)dx.\)

User input for all distributions is given in terms of the mean \(\mu\) and standard deviation \(\sigma\), and distribution parameters are then determined by moment matching. For a gamma distribution, this means that \(f(x,\alpha ,\beta ) = f(x,\alpha =\mu ^2/\sigma ^2, \beta =\mu /\sigma ^2)\), for a Weibull distribution, \(f(x, k, \lambda ) = f(x,k=(\frac{\sigma }{\mu })^{-1.086}, \lambda =\mu / \Gamma (1+1/k))\), and for an exponential distribution \(f(x,\lambda )= f(x, \lambda=1/\mu ).\) All model parameters are listed in the attached Table S1 in the Supplementary Information.

The model can take the effect of emerging Variants of Concern (VoCs) into account, based on user input on the VoC proportion and increased transmissibility. For a full description of the implementation of VoC in the model see Supplementary Text S1.

Vaccination model

The vaccination model consists of two parts. In the first, we forecast the number of people vaccinated over the length of the forecast for each of the doses (1st, 2nd, and 3rd/booster dose), while in the second part we convert these administered vaccinations into population-level protection against transmission.

The first doses (\(V_{1}(t)\)) are assumed to be administered at the same rate as during the last observed week:

$$\begin{aligned} V_{1}(t>T_{S}) = \sum _{i=T_S-6}^{T_S}V_{1}(i)/7. \end{aligned}$$


The second dose is assumed to be administered at a fixed time delay after the first dose. This time delay \(\Delta T_V\) is extracted from the data as the maximum time for which

$$\begin{aligned} V_{1}(T_S-\Delta T_V) \ge V_{2}(T_S), \end{aligned}$$


holds. The second doses are then a direct reflection of the number of first doses \(\Delta T_V\) days ago,

$$\begin{aligned} V_{2}(t) = V_{1}(t-\Delta T_V). \end{aligned}$$


The uptake of the second dose is thus assumed to be identical to the uptake of the first dose.

The daily administered third/booster doses are assumed to be a continuation of the mean daily third doses over the previous week:

$$\begin{aligned} V_{3}(t>T_{S}) = \sum _{i=T_S-6}^{T_S}V_{3}(i)/7. \end{aligned}$$


This is in line with the forecasting of the first doses, with the exception that the cumulative number of third doses is not allowed to exceed the cumulative number of second doses a certain delay (\(\Delta T_B\)) ago, such that

$$\begin{aligned} \sum _{i=0}^{T_S} V_{3}(i) \le \sum _{i=0}^{T_S-\Delta T_B} V_{2}(i), \end{aligned}$$


has to hold.

Population protection

Each dose is assumed to have a time-dependent additive effect on the population-wide protection against infection. At the individual level, \(G_{d}(\tau )\) denotes the proportion of individuals no longer susceptible to infection \(\tau\) days after the administration of dose d. This function thus serves to simulate the delay between vaccine administration and maximum vaccine protection at the individual level. We assume no waning of immunity, such that \(G_d(\infty ) = E_d\), with \(E_d\) the vaccine effectiveness against transmission elicited by doses d.

The additive effect on vaccine effectiveness of each additional dose is defined as \({E_d}^*= E_d-E_{d-1}\); (Consequently, \({E_2}^*= E_2-E_{1}\) and \({E_3}^*= E_3-E_{2} = E_3-({E_{2}^*}+{E_{1}})\)). Consequently, the additive increase in an individual’s protection due to an extra dose can be written as:

$$\begin{aligned} {G_d}^*(\tau ) = G_d(\tau ) \frac{{E_d}^*}{E_d}. \end{aligned}$$


\(G_d(\tau )\) is input as the cumulative distribution function (CDF) of a normal distribution with mean \(\mu (G_1)=15\), \(\mu (G_2)=15\), \(\mu (G_3)=7\), and standard deviation \(\sigma (G_1)=3.8\), \(\sigma (G_2)=6.5\), \(\sigma (G_3)=3.8\) as default values. The values were chosen to mimic a delay of slightly longer than 2 weeks for the first two doses, and 1 week for the booster (third dose), with more variation between individuals in the response against the second dose, as we lack precise estimates of these delays.

The population protection at time t is then given by,

$$\begin{aligned} G_p(t) = \sum _{d}^{\{1,2,3\}}\sum _{i=0}^{t-1} \left( {G_d}^*(t-i) \frac{V_{d}(i)}{N} \right) . \end{aligned}$$


which is used in both the R forecasting step and the incidence model.

Estimation and forecasting of R

In order to inform the transmission process part of the incidence model, we need to forecast the reproduction number of the pathogen over the model timeframe. This forecast is based on the observed time-varying effective reproduction number (\(R_e(t)\)), and implemented either as the static continuation of the most recent observation, or forecasted using an ETS (Error, Trend, Seasonal) model13 based on the last 100 days of observations, depending on the user’s preference. Furthermore, we provide the option to incorporate the effect of a Variant of Concern (VoC) on the development of the \(R_0(t)\). In all cases, we use estimated values of \(R_e(t)\) from the available data up to the simulation start date \(T_S\), \(R_e(t\le T_S)\), and forecasted values after the simulation start date (\(R_e(t>T_S)\)).

We estimate the time-varying effective reproduction number \(R_e(t)\) using the Cori et al. R estimation method14 as implemented in the EpiEstim R package (version 2.2-4), with a given non-parametric Serial Interval (SI). This SI distribution denotes the number of days between equal disease events (e.g., onset of symptoms) of directly connected cases. The default SI is assumed to be Gamma distributed with a mean of 5 days and a standard deviation of 4.9. These values were chosen to reflect estimates of mean SI from multiple studies15 with a relatively high standard deviation.

By keeping track of the cumulative number of infected individuals, as well as the previously calculated population protection (\(G_P(t)\)), we can calculate \(R_0(t)\) from \(R_e(t)\), given that \(R_e(t)\) is related to \(R_0\) through the number of susceptibles at time t, defined as S(t):

$$\begin{aligned} S(t)= & {} N (1 – \sum _i^t \frac{I(i)}{N}) (1-G_P(t)) \end{aligned}$$


$$\begin{aligned} R_e(t)= & {} R_0(t) \frac{S(t)}{N}, \Rightarrow R_0(t) = \frac{R_e(t) N}{S(t)}. \end{aligned}$$


ETS model

To produce a dynamic forecast of \(R_0(t)\), we used an ETS model which underlies an exponential smoothing method13. We use the ETS(A,A,N) model, which has additive errors (A), additive trend (A) and no seasonality (N); We use a \(\log (R_0(t))\) time series of 100 days, i.e. \(\log (R_0(T_S-99… T_S))\), as input for the ETS model. The logarithm serves to forecast the relative changes in \(R_0\) and avoids negative \(R_0\) forecasts. Per model iteration, we pick a random quantile for the prediction interval and use this as the trajectory for \(R_0(t>T_S)\). A detailed description of the ETS model is given in the Supplementary Information Section S2.

Incidence model

The incidence forecast model is based on a stochastic implementation of a Susceptible-Infected-Recovered (SIR) model. The model describes the number of new cases as a function of the population still susceptible (S(t)) to infection and of infection pressure (P(t)) exerted by all those currently infectious, and the total population size (N). The size of the components is given as absolute numbers of individuals; thus \(S(0) = N\). The infection process is governed by the forecasted basic reproduction number \(R_0(t\ge T_S)\) computed in the preceding modules of the framework.

If no VoC is defined, the model describes the dynamics of a single variant (the background variant). Conversely, the model converts into a two-strain model if a VoC is defined, keeping track of the infected individuals for either variant as well as the total number of individuals susceptible to either variant.

The size of the susceptible class is determined by both the cumulative number of infected individuals and the population level protection through vaccination, as defined before,

$$\begin{aligned} S(t) = N (1-G_P(t))\left( 1- \sum _{i=0}^{t}\frac{ I(i)}{N}\right) \end{aligned}$$


The infection pressure exerted on S(t) is then given as the weighted number of infected individuals in the preceding serial interval,

$$\begin{aligned} P(t) = \sum _{i=0}^t H(i)I(t-i) \end{aligned}$$


where H(i) is the serial interval distribution, defined as the probability that the proportion of cases causing new cases i days after they were reported themselves. These are thus all past cases that are able to cause new cases on the current day t, weighted by the serial interval distribution. Given the current \(R_0(t)\) and P(t), the expected mean total number of newly infected individuals on day t+1 is \(\overline{I}(t+1)=R_0(t) P(t)\).

The transmission rate per infected individual (The probability of infecting each of the other individuals in the population), is given as \(\beta (t) = R_0(t)/N\).

The infection pressure exerted on each individual within the population at time t is then calculated as

$$\begin{aligned} P_I(t)= 1- (1-\beta (t))^{P(t)}, \end{aligned}$$


which for low numbers of infected individuals (and therefore low P(t)) can be simplified to \(P_I(t)= \beta P(t).\) The newly infected individuals (\(I(t+1)\)) are then randomly picked from a binomial distribution \(I(t+1)=Binom(p=P_I(t),N=S(t))\).

Care path (Within-hospital) model

To forecast bed occupancy in the hospital, the main endpoint of this work, we first simulate the path of each single patient admitted to the hospital through the general ward, ICU, and step-down units (care path model), as described in Donker et al.16. Then we join this model with the predicted incidence to forecast the hospital admission rate and the bed occupancy.

We use the following parameters to model the care path of individual patients through the hospital.

  • Length of stay (LoS) distribution on general wards (GW),

  • LoS distribution on intensive care units (ICU),

  • LoS distribution on step-down units (SDU),

  • Proportion of patients being transferred from GW to ICU,

  • Proportion of patients being transferred from ICU to SDU.

It is possible to add the proportion of patients directly admitted to ICU (\(H_I\)) to this list, but this is usually estimated from the data (see below).

Once all admissions and discharges are determined, we track the total number of patients on each ward type on each day of the forecast, presenting the final result of the forecast. Note that, despite modelling the general ward and step-down unit separately, we combine both wards into a single number of beds occupied on the general ward in any other step, because the occupied beds in both cases are usually reported as part of the same type (i.e., occupied non-ICU beds).

Admission rate

To estimate the proportion of reported cases being admitted to hospital, we need to compare the incidence of reported cases to the total number of patients in hospital at a given moment (prevalence). This can be achieved by taking the length of stay of admitted patients into account. We create a complete Length of Stay distribution \(L(t_a)\), defined as the proportion of patients still in hospital \(t_a\) days after admission by simulating the care paths of a large number of patients (\(N_P=10000\)) with the same admission date using these parameters. For each day after admission, we then track how many patients are still in hospital (\(N_H(t_a)\)), resulting in the Length of Stay distribution,

$$\begin{aligned} L(t_a)=\frac{N_H(t_a)}{N_P} \end{aligned}$$


The admission rate can then be directly estimated using the current number of occupied beds in general wards \(B_{GW}\) and in ICU \(B_{ICU}\)

$$\begin{aligned} \alpha (t) =\frac{B_{GW}(t)+B_{ICU}(t)}{ \sum _{i=0}^{t} I(i) L(t-i) }, \end{aligned}$$


which we assume to remain stable after the start of the simulation: \(\alpha (t>T_S) = \alpha (T_S)\)).

The same method delivers the distribution of admitted patients on the GW \(L_{GW}(t_a)\) and on the ICU \(L_{ICU}(t_a)\). Note that \(L(t_a) = L_{GW}(t_a)+L_{ICU}(t_a)\), and while \(L(0)=1\), \(L_{GW}(0) \le 1\) and \(L_{ICU}(0) \le 1\). To be exact, \(L_{ICU}(0) = H_I\), reflecting the directly to ICU admitted patients, and \(L_{GW}(0) = 1- H_I\),

The admission rate can also be calculated for the separate wards,

$$\begin{aligned} \alpha _{GW}(t)= & {} \frac{B_{GW}(t)}{ \sum _{i=0}^{t} I(i) L_{GW}(t-i) }, \end{aligned}$$


$$\begin{aligned} \alpha _{ICU}(t)= & {} \frac{B_{ICU}(t)}{ \sum _{i=0}^{t} I(i) L_{ICU}(t-i) }. \end{aligned}$$


We can calculate the expected number of ICU beds under the assumption that no patients enter the ICU directly.

$$\begin{aligned} {B^*_{ICU}}(t)=\left( B_{GW}(t)+B_{ICU}(t)\right) \frac{ \sum _{i=0}^{t} I(i) L_{ICU}(t-i)}{\sum _{i=0}^{t} I(i) L(t-i) }. \end{aligned}$$


The surplus of ICU patients (\(B_{ICU}(t)-{B^*_{ICU}}(t)\)) is then caused by the direct admission of patients to ICU, and calculated as the ICU surplus over the total number of occupied beds.

$$\begin{aligned} H_I(t) = \frac{B_{ICU}(t)-{B^*_{ICU}}(t)}{B_{GW}(t)+B_{ICU}(t)} \end{aligned}$$


Similar to \(\alpha (t)\), we assume that \(\alpha _{GW}(t)\), \(\alpha _{ICU}(t)\), and \(H_I(t)\) remain stable after \(T_S\).

This admission rate then determines the number of individuals admitted to the hospital, entering the care path model, pulled from the binomial distribution

$$\begin{aligned} Binom(p=\alpha ^*(t),N=I(t)). \end{aligned}$$


The care path model then depends on determining lengths of stays on each ward and movements between ward types for each individual admitted patient sampled from their respective distributions.

However, the care path model needs to determine the lengths of stays for the patients already in hospital at \(T_S\), that is \(B_{GW}(T_S)\) and \(B_{ICU}(T_S)\). To simulate their future discharge and transfer events, we create an admission record using a uniform number of patients per day for the 100 days preceding \(T_S\) and simulate their care paths.

We then select all patients present in the GW or ICU at \(T_S\), creating the “Current” patient population in the hospital. From the current population, we randomly select \(N_{GW} = B_{GW}(T_S)\) and \(N_{ICU} = B_{ICU}(T_S)\) patients to recreate the discharges for the current population.

Dashboard server implementation

The main backbone of the model is written in R (version 4.1.217) as an R-Shiny18 (version 1.6) app running using shiny-server on an 8-core, 16GB ram, Ubuntu (version 20.04.3) VM server. An IP-hashed load balancer divides traffic over eight separate instances of shiny-server, reducing the number of concurrent users per shiny-server. We measure the performance of the model based on this infrastructure.

The most computationally demanding part of the care path model was written in the Julia programming language19 to improve performance. Julia is rapidly gaining momentum as a tool for scientific computing due to higher performance compared to languages like R or Python without compromising ease of use. The care path model written in Julia takes approximately 1 hundredth of the processing time of the R equivalent. We use JuliaConnectoR (v. to allow communication between the Julia and R code bases. One of the drawbacks of Julia is that the functions require just-in-time compilation the first time they are used; the overhead related to compilation time exceeds the running time of the entire care path model. To address this issue, we generated a fully compiled version of the model code21, virtually eliminating loading time. Once the main server is started, a concurrent Julia session is created and is shared between all shiny-server sessions. For a detailed dashboard performance benchmarking, see Supplementary Information Section S3.

User interface

The user interface of the dashboard is shown in Fig. 2. It is divided into sections (tabs) following the main modules of the model, in combination with a side panel showing the basic controls: the catchment area selection, the forecast start date, the number of simulation runs, and the simulation length in days. On each tab, parameter choices can be made that affect the current and further tabs, but not any previous tabs. In this way, we prevent unnecessary re-running of the simulations.

Figure 2
figure 2

The user interface of the on-request COVID-19 bed demand forecasting model. (A) Side panel with basic controls, (B) Reported incidence, (C) Effective R forecast, (D) Vaccination forecast, (E) Incidence forecast, and (F) Bed occupancy forecast.

Tabs are ordered as follows: (1) Incidence, showing the daily reported number of cases with and without the nowcast. (2) Vaccination, showing the cumulative number of vaccine doses administered, as well as the forecast of vaccinations. It also includes the option to change the assumptions on the future vaccinations (number of first doses and booster doses per day, and the minimum delay between 2nd dose and booster). (3) Effective R, showing the time-varying R estimation based on the EpiEstim package. This tab also includes the option to use the ETS model and the option to include a variant of concern (VoC), together with the needed VoC parameters. (4) Incidence forecast, showing the results of the incidence model, with the option to include the reporting pattern related to the weekdays. (5) Bed forecast, reporting the results of the within-hospital care path model. The model is run using data on the current number of occupied beds in the hospital of interest, inputted either manually or by uploading a specifically structured file. (6) Parameter choices. This tab includes all parameters included as default values in the other tabs, which should only be changed by users with more advanced knowledge of the underlying model.

Forecast validation

Figure 3
figure 3

Example forecasts of \(R_t\), Incidence, and occupied ICU beds. The forecasts start on the 07-02-2021. Filled red dots represent the 30 previous days and empty red circles represent 30 forecasted days. The performance of the forecasts was validated by calculating its accuracy and precision. Accuracy is defined as the number of observations (dots) falling into the interquartile range (IQR, light grey) and the 95% interval range (dark grey). In this example, the accuracy of the \(R_t\) forecast is 0.43 and 0.2, respectively.

Figure 3 shows example forecasts of \(R_e(t)\), incidence, and bed occupancy. The performance of the model was assessed based on the accuracy and precision of the forecasts from 470 different starting days. The accuracy is defined by determining the number of observed values at day d (with \(d=1\) being the start of the simulation) that fall within the interquartile range (IQR) or within the 95% range of the forecast simulation distribution divided by \(N_d\). \(N_d\) is defined as the total number of observed values at day d in all 470 forecasts, or in other words, the number of starting dates for which we have data d days into the future:

$$\begin{aligned} accuracy(d) = \frac{\sum _{i}^{N=470} {obsw_{id}}}{N_d} \end{aligned}$$


where \(obsw = 1\) if the observed value at day d lies within the range (if not, \(obsw = 0\)). The IQR and 95% range of each forecast are defined by their 100 independent model runs. The day d goes from 1…30 with \(d = 1\) being the first forecasted day. Similarly, the precision was defined as the size of the IQR divided by the maximum forecast value at day d averaged over all 470 forecasts with max being the maximum forecasted value from all simulation runs for forecast i at a given day d:

$$\begin{aligned} precision(d) = \frac{\sum _{i}^{N=470} (IQR/max)_{i,d}}{N_d} \end{aligned}$$


To evaluate the time dependency of the performance of the forecasts we analysed the mean absolute scaled error (MASE)22 (see Supplementary Material section S6). Further, we calculated the bias of the forecast to determine if our method results in a systematic over- or underestimation (see Supplementary Material Section S6).

Hospital-specific forecasts

We forecasted the bed demand for four university hospitals in Baden-Württemberg, Germany, located in Freiburg, Mannheim, Heidelberg, and Tübingen. Participants from each hospital provided a list of counties they perceived as their main catchment area, as well as a list of the number of occupied beds in both ICUs and General wards between 14 October 2020 and 27 January 2022. Supplementary Table S3 lists the counties in the specified local catchment areas and their total population. In addition to those hospital-specific catchments, we produced forecasts based on every German state as an individual catchment, plus the whole of Germany (all states combined). We created bed demand forecasts for each hospital starting at every day between 14-10-2020 and 27-01-2022, resulting in a total of 470 forecasts. For each of the 470 forecasts we performed 100 independent model runs, with a forecasting length of 30 days. The generated forecasts were compared to the actual observed bed occupancy data. We tested the effect of forecasting \(R_e(t)\) on the bed forecast using either the exponential smoothing (ETS) or the naive \(R_e(t)\) forecast, as well as including or excluding the added fitness advantage of a given VoC to the prediction. Additionally, a scenario was tested where the selected catchment and the area corresponding to the observed hospital have a similar population size but do not overlap in their possible admitted patients (e.g. a city which is geographically far away from the hospital of interest). To accomplish this, we chose representative catchments around the city of Rostock, Germany and matched their bed occupancy to the catchments of the four investigated hospitals.

Source link

Related Articles

Leave a Reply

Stay Connected

- Advertisement -spot_img

Latest Articles

%d bloggers like this: