General model structure
The complete forecast model consists of 5 main pieces of code, referred to as modules: (1) Data loading and nowcasting, (2) Reproduction number estimates, (3) Vaccination coverage forecasting, (4) Incidence model, and (5) Care path model. The calculations done in modules 2 and 3 rely only on the loaded data and user input, while modules 4 and 5 also build further on the results of each previous module (see Fig. 1). This hierarchical structure implies that any change to parameters in a module updates the outputs in all dependent modules. However, by guiding the user through the modules in sequential order, the number of required computations can be drastically reduced.
Data structure and definitions
The model needs data on infection incidence, vaccine doses, bed occupancy in general wards and intensive care units, and a simulation window for which a forecast is required. Although we developed this model for hospitals in Germany, based on data provided by the Robert Koch Institute^{11} and the Deutsche Interdisziplinäre Vereinigung für Intensiv und Notfallmedizin (DIVI)^{12}, the model can be applied to any country that has the following data available (preferably if in remoteaccessible, machinereadable format):

1.
Incidence, \(I_{g}(t)\)

2.
Administered vaccinations, \(V_{d,g}(t)\)

3.
Bed occupancy on general wards (GW) \(B_{GW,g}(t)\) and ICU \(B_{ICU,g}(t),\)
where t denotes the time in days since a reference date before the start of the pandemic, g denotes a geographical subdivision (in the case of Germany, this is a district, i.e., Landkreis or Stadtkreis), and d denotes the dose of vaccine administered (1st, 2nd, or 3rd/booster). Incidence and vaccination data are needed for each day of the period of reference, while merely the value at the start date of the forecast is needed for bed occupancy.
The user defines the catchment area of interest, selecting geographical areas from the available list (\(K_{all}\)), into the catchment set (\(K_u\)). After this, the input data is summed over the selected areas:
$$\begin{aligned} I(t)= & {} \sum _{g\in K_u} I_g(t), \end{aligned}$$
(1)
$$\begin{aligned} V_d(t)= & {} \sum _{g\in K_u} V_{d,g}(t), \end{aligned}$$
(2)
$$\begin{aligned} B_{GW}(t)= & {} \sum _{g\in K_u} B_{GW,g}(t), \end{aligned}$$
(3)
$$\begin{aligned} B_{ICU}(t)= & {} \sum _{g\in K_u} B_{ICU,g}(t). \end{aligned}$$
(4)
The start of the simulation is denoted as \(T_S\), for which the incidence, number of vaccinations, and bed occupancy are taken from the (last datapoint of the) data, the first forecasted day is thus \(T_S+1\). The length of the forecast is defined as L; the last forecasted date is therefore given as \(T_E = T_S+L\). The size of any class of individuals in the model is given in absolute numbers of individuals; thus \(S(0) = N\), with N being the population size.
Because many of the parameters in the dashboard’s models are continuously distributed (as Exponential, Gamma, or Weibull), and the model uses discrete time steps, we need to discretise the parameter distributions. For each continuous distribution, with probability density function (PDF) \(f_c(t)\), the discretised PDF is defined as \(f_d(t) = \int _t^{t+1}f_c(x)dx.\)
User input for all distributions is given in terms of the mean \(\mu\) and standard deviation \(\sigma\), and distribution parameters are then determined by moment matching. For a gamma distribution, this means that \(f(x,\alpha ,\beta ) = f(x,\alpha =\mu ^2/\sigma ^2, \beta =\mu /\sigma ^2)\), for a Weibull distribution, \(f(x, k, \lambda ) = f(x,k=(\frac{\sigma }{\mu })^{1.086}, \lambda =\mu / \Gamma (1+1/k))\), and for an exponential distribution \(f(x,\lambda )= f(x, \lambda=1/\mu ).\) All model parameters are listed in the attached Table S1 in the Supplementary Information.
The model can take the effect of emerging Variants of Concern (VoCs) into account, based on user input on the VoC proportion and increased transmissibility. For a full description of the implementation of VoC in the model see Supplementary Text S1.
Vaccination model
The vaccination model consists of two parts. In the first, we forecast the number of people vaccinated over the length of the forecast for each of the doses (1st, 2nd, and 3rd/booster dose), while in the second part we convert these administered vaccinations into populationlevel protection against transmission.
The first doses (\(V_{1}(t)\)) are assumed to be administered at the same rate as during the last observed week:
$$\begin{aligned} V_{1}(t>T_{S}) = \sum _{i=T_S6}^{T_S}V_{1}(i)/7. \end{aligned}$$
(5)
The second dose is assumed to be administered at a fixed time delay after the first dose. This time delay \(\Delta T_V\) is extracted from the data as the maximum time for which
$$\begin{aligned} V_{1}(T_S\Delta T_V) \ge V_{2}(T_S), \end{aligned}$$
(6)
holds. The second doses are then a direct reflection of the number of first doses \(\Delta T_V\) days ago,
$$\begin{aligned} V_{2}(t) = V_{1}(t\Delta T_V). \end{aligned}$$
(7)
The uptake of the second dose is thus assumed to be identical to the uptake of the first dose.
The daily administered third/booster doses are assumed to be a continuation of the mean daily third doses over the previous week:
$$\begin{aligned} V_{3}(t>T_{S}) = \sum _{i=T_S6}^{T_S}V_{3}(i)/7. \end{aligned}$$
(8)
This is in line with the forecasting of the first doses, with the exception that the cumulative number of third doses is not allowed to exceed the cumulative number of second doses a certain delay (\(\Delta T_B\)) ago, such that
$$\begin{aligned} \sum _{i=0}^{T_S} V_{3}(i) \le \sum _{i=0}^{T_S\Delta T_B} V_{2}(i), \end{aligned}$$
(9)
has to hold.
Population protection
Each dose is assumed to have a timedependent additive effect on the populationwide protection against infection. At the individual level, \(G_{d}(\tau )\) denotes the proportion of individuals no longer susceptible to infection \(\tau\) days after the administration of dose d. This function thus serves to simulate the delay between vaccine administration and maximum vaccine protection at the individual level. We assume no waning of immunity, such that \(G_d(\infty ) = E_d\), with \(E_d\) the vaccine effectiveness against transmission elicited by doses d.
The additive effect on vaccine effectiveness of each additional dose is defined as \({E_d}^*= E_dE_{d1}\); (Consequently, \({E_2}^*= E_2E_{1}\) and \({E_3}^*= E_3E_{2} = E_3({E_{2}^*}+{E_{1}})\)). Consequently, the additive increase in an individual’s protection due to an extra dose can be written as:
$$\begin{aligned} {G_d}^*(\tau ) = G_d(\tau ) \frac{{E_d}^*}{E_d}. \end{aligned}$$
(10)
\(G_d(\tau )\) is input as the cumulative distribution function (CDF) of a normal distribution with mean \(\mu (G_1)=15\), \(\mu (G_2)=15\), \(\mu (G_3)=7\), and standard deviation \(\sigma (G_1)=3.8\), \(\sigma (G_2)=6.5\), \(\sigma (G_3)=3.8\) as default values. The values were chosen to mimic a delay of slightly longer than 2 weeks for the first two doses, and 1 week for the booster (third dose), with more variation between individuals in the response against the second dose, as we lack precise estimates of these delays.
The population protection at time t is then given by,
$$\begin{aligned} G_p(t) = \sum _{d}^{\{1,2,3\}}\sum _{i=0}^{t1} \left( {G_d}^*(ti) \frac{V_{d}(i)}{N} \right) . \end{aligned}$$
(11)
which is used in both the R forecasting step and the incidence model.
Estimation and forecasting of R
In order to inform the transmission process part of the incidence model, we need to forecast the reproduction number of the pathogen over the model timeframe. This forecast is based on the observed timevarying effective reproduction number (\(R_e(t)\)), and implemented either as the static continuation of the most recent observation, or forecasted using an ETS (Error, Trend, Seasonal) model^{13} based on the last 100 days of observations, depending on the user’s preference. Furthermore, we provide the option to incorporate the effect of a Variant of Concern (VoC) on the development of the \(R_0(t)\). In all cases, we use estimated values of \(R_e(t)\) from the available data up to the simulation start date \(T_S\), \(R_e(t\le T_S)\), and forecasted values after the simulation start date (\(R_e(t>T_S)\)).
We estimate the timevarying effective reproduction number \(R_e(t)\) using the Cori et al. R estimation method^{14} as implemented in the EpiEstim R package (version 2.24), with a given nonparametric Serial Interval (SI). This SI distribution denotes the number of days between equal disease events (e.g., onset of symptoms) of directly connected cases. The default SI is assumed to be Gamma distributed with a mean of 5 days and a standard deviation of 4.9. These values were chosen to reflect estimates of mean SI from multiple studies^{15} with a relatively high standard deviation.
By keeping track of the cumulative number of infected individuals, as well as the previously calculated population protection (\(G_P(t)\)), we can calculate \(R_0(t)\) from \(R_e(t)\), given that \(R_e(t)\) is related to \(R_0\) through the number of susceptibles at time t, defined as S(t):
$$\begin{aligned} S(t)= & {} N (1 – \sum _i^t \frac{I(i)}{N}) (1G_P(t)) \end{aligned}$$
(12)
$$\begin{aligned} R_e(t)= & {} R_0(t) \frac{S(t)}{N}, \Rightarrow R_0(t) = \frac{R_e(t) N}{S(t)}. \end{aligned}$$
(13)
ETS model
To produce a dynamic forecast of \(R_0(t)\), we used an ETS model which underlies an exponential smoothing method^{13}. We use the ETS(A,A,N) model, which has additive errors (A), additive trend (A) and no seasonality (N); We use a \(\log (R_0(t))\) time series of 100 days, i.e. \(\log (R_0(T_S99… T_S))\), as input for the ETS model. The logarithm serves to forecast the relative changes in \(R_0\) and avoids negative \(R_0\) forecasts. Per model iteration, we pick a random quantile for the prediction interval and use this as the trajectory for \(R_0(t>T_S)\). A detailed description of the ETS model is given in the Supplementary Information Section S2.
Incidence model
The incidence forecast model is based on a stochastic implementation of a SusceptibleInfectedRecovered (SIR) model. The model describes the number of new cases as a function of the population still susceptible (S(t)) to infection and of infection pressure (P(t)) exerted by all those currently infectious, and the total population size (N). The size of the components is given as absolute numbers of individuals; thus \(S(0) = N\). The infection process is governed by the forecasted basic reproduction number \(R_0(t\ge T_S)\) computed in the preceding modules of the framework.
If no VoC is defined, the model describes the dynamics of a single variant (the background variant). Conversely, the model converts into a twostrain model if a VoC is defined, keeping track of the infected individuals for either variant as well as the total number of individuals susceptible to either variant.
The size of the susceptible class is determined by both the cumulative number of infected individuals and the population level protection through vaccination, as defined before,
$$\begin{aligned} S(t) = N (1G_P(t))\left( 1 \sum _{i=0}^{t}\frac{ I(i)}{N}\right) \end{aligned}$$
(14)
The infection pressure exerted on S(t) is then given as the weighted number of infected individuals in the preceding serial interval,
$$\begin{aligned} P(t) = \sum _{i=0}^t H(i)I(ti) \end{aligned}$$
(15)
where H(i) is the serial interval distribution, defined as the probability that the proportion of cases causing new cases i days after they were reported themselves. These are thus all past cases that are able to cause new cases on the current day t, weighted by the serial interval distribution. Given the current \(R_0(t)\) and P(t), the expected mean total number of newly infected individuals on day t+1 is \(\overline{I}(t+1)=R_0(t) P(t)\).
The transmission rate per infected individual (The probability of infecting each of the other individuals in the population), is given as \(\beta (t) = R_0(t)/N\).
The infection pressure exerted on each individual within the population at time t is then calculated as
$$\begin{aligned} P_I(t)= 1 (1\beta (t))^{P(t)}, \end{aligned}$$
(16)
which for low numbers of infected individuals (and therefore low P(t)) can be simplified to \(P_I(t)= \beta P(t).\) The newly infected individuals (\(I(t+1)\)) are then randomly picked from a binomial distribution \(I(t+1)=Binom(p=P_I(t),N=S(t))\).
Care path (Withinhospital) model
To forecast bed occupancy in the hospital, the main endpoint of this work, we first simulate the path of each single patient admitted to the hospital through the general ward, ICU, and stepdown units (care path model), as described in Donker et al.^{16}. Then we join this model with the predicted incidence to forecast the hospital admission rate and the bed occupancy.
We use the following parameters to model the care path of individual patients through the hospital.

Length of stay (LoS) distribution on general wards (GW),

LoS distribution on intensive care units (ICU),

LoS distribution on stepdown units (SDU),

Proportion of patients being transferred from GW to ICU,

Proportion of patients being transferred from ICU to SDU.
It is possible to add the proportion of patients directly admitted to ICU (\(H_I\)) to this list, but this is usually estimated from the data (see below).
Once all admissions and discharges are determined, we track the total number of patients on each ward type on each day of the forecast, presenting the final result of the forecast. Note that, despite modelling the general ward and stepdown unit separately, we combine both wards into a single number of beds occupied on the general ward in any other step, because the occupied beds in both cases are usually reported as part of the same type (i.e., occupied nonICU beds).
Admission rate
To estimate the proportion of reported cases being admitted to hospital, we need to compare the incidence of reported cases to the total number of patients in hospital at a given moment (prevalence). This can be achieved by taking the length of stay of admitted patients into account. We create a complete Length of Stay distribution \(L(t_a)\), defined as the proportion of patients still in hospital \(t_a\) days after admission by simulating the care paths of a large number of patients (\(N_P=10000\)) with the same admission date using these parameters. For each day after admission, we then track how many patients are still in hospital (\(N_H(t_a)\)), resulting in the Length of Stay distribution,
$$\begin{aligned} L(t_a)=\frac{N_H(t_a)}{N_P} \end{aligned}$$
(17)
The admission rate can then be directly estimated using the current number of occupied beds in general wards \(B_{GW}\) and in ICU \(B_{ICU}\)
$$\begin{aligned} \alpha (t) =\frac{B_{GW}(t)+B_{ICU}(t)}{ \sum _{i=0}^{t} I(i) L(ti) }, \end{aligned}$$
(18)
which we assume to remain stable after the start of the simulation: \(\alpha (t>T_S) = \alpha (T_S)\)).
The same method delivers the distribution of admitted patients on the GW \(L_{GW}(t_a)\) and on the ICU \(L_{ICU}(t_a)\). Note that \(L(t_a) = L_{GW}(t_a)+L_{ICU}(t_a)\), and while \(L(0)=1\), \(L_{GW}(0) \le 1\) and \(L_{ICU}(0) \le 1\). To be exact, \(L_{ICU}(0) = H_I\), reflecting the directly to ICU admitted patients, and \(L_{GW}(0) = 1 H_I\),
The admission rate can also be calculated for the separate wards,
$$\begin{aligned} \alpha _{GW}(t)= & {} \frac{B_{GW}(t)}{ \sum _{i=0}^{t} I(i) L_{GW}(ti) }, \end{aligned}$$
(19)
$$\begin{aligned} \alpha _{ICU}(t)= & {} \frac{B_{ICU}(t)}{ \sum _{i=0}^{t} I(i) L_{ICU}(ti) }. \end{aligned}$$
(20)
We can calculate the expected number of ICU beds under the assumption that no patients enter the ICU directly.
$$\begin{aligned} {B^*_{ICU}}(t)=\left( B_{GW}(t)+B_{ICU}(t)\right) \frac{ \sum _{i=0}^{t} I(i) L_{ICU}(ti)}{\sum _{i=0}^{t} I(i) L(ti) }. \end{aligned}$$
(21)
The surplus of ICU patients (\(B_{ICU}(t){B^*_{ICU}}(t)\)) is then caused by the direct admission of patients to ICU, and calculated as the ICU surplus over the total number of occupied beds.
$$\begin{aligned} H_I(t) = \frac{B_{ICU}(t){B^*_{ICU}}(t)}{B_{GW}(t)+B_{ICU}(t)} \end{aligned}$$
(22)
Similar to \(\alpha (t)\), we assume that \(\alpha _{GW}(t)\), \(\alpha _{ICU}(t)\), and \(H_I(t)\) remain stable after \(T_S\).
This admission rate then determines the number of individuals admitted to the hospital, entering the care path model, pulled from the binomial distribution
$$\begin{aligned} Binom(p=\alpha ^*(t),N=I(t)). \end{aligned}$$
(23)
The care path model then depends on determining lengths of stays on each ward and movements between ward types for each individual admitted patient sampled from their respective distributions.
However, the care path model needs to determine the lengths of stays for the patients already in hospital at \(T_S\), that is \(B_{GW}(T_S)\) and \(B_{ICU}(T_S)\). To simulate their future discharge and transfer events, we create an admission record using a uniform number of patients per day for the 100 days preceding \(T_S\) and simulate their care paths.
We then select all patients present in the GW or ICU at \(T_S\), creating the “Current” patient population in the hospital. From the current population, we randomly select \(N_{GW} = B_{GW}(T_S)\) and \(N_{ICU} = B_{ICU}(T_S)\) patients to recreate the discharges for the current population.
Dashboard server implementation
The main backbone of the model is written in R (version 4.1.2^{17}) as an RShiny^{18} (version 1.6) app running using shinyserver on an 8core, 16GB ram, Ubuntu (version 20.04.3) VM server. An IPhashed load balancer divides traffic over eight separate instances of shinyserver, reducing the number of concurrent users per shinyserver. We measure the performance of the model based on this infrastructure.
The most computationally demanding part of the care path model was written in the Julia programming language^{19} to improve performance. Julia is rapidly gaining momentum as a tool for scientific computing due to higher performance compared to languages like R or Python without compromising ease of use. The care path model written in Julia takes approximately 1 hundredth of the processing time of the R equivalent. We use JuliaConnectoR (v.1.0.0.9009)^{20} to allow communication between the Julia and R code bases. One of the drawbacks of Julia is that the functions require justintime compilation the first time they are used; the overhead related to compilation time exceeds the running time of the entire care path model. To address this issue, we generated a fully compiled version of the model code^{21}, virtually eliminating loading time. Once the main server is started, a concurrent Julia session is created and is shared between all shinyserver sessions. For a detailed dashboard performance benchmarking, see Supplementary Information Section S3.
User interface
The user interface of the dashboard is shown in Fig. 2. It is divided into sections (tabs) following the main modules of the model, in combination with a side panel showing the basic controls: the catchment area selection, the forecast start date, the number of simulation runs, and the simulation length in days. On each tab, parameter choices can be made that affect the current and further tabs, but not any previous tabs. In this way, we prevent unnecessary rerunning of the simulations.
Tabs are ordered as follows: (1) Incidence, showing the daily reported number of cases with and without the nowcast. (2) Vaccination, showing the cumulative number of vaccine doses administered, as well as the forecast of vaccinations. It also includes the option to change the assumptions on the future vaccinations (number of first doses and booster doses per day, and the minimum delay between 2nd dose and booster). (3) Effective R, showing the timevarying R estimation based on the EpiEstim package. This tab also includes the option to use the ETS model and the option to include a variant of concern (VoC), together with the needed VoC parameters. (4) Incidence forecast, showing the results of the incidence model, with the option to include the reporting pattern related to the weekdays. (5) Bed forecast, reporting the results of the withinhospital care path model. The model is run using data on the current number of occupied beds in the hospital of interest, inputted either manually or by uploading a specifically structured file. (6) Parameter choices. This tab includes all parameters included as default values in the other tabs, which should only be changed by users with more advanced knowledge of the underlying model.
Forecast validation
Figure 3 shows example forecasts of \(R_e(t)\), incidence, and bed occupancy. The performance of the model was assessed based on the accuracy and precision of the forecasts from 470 different starting days. The accuracy is defined by determining the number of observed values at day d (with \(d=1\) being the start of the simulation) that fall within the interquartile range (IQR) or within the 95% range of the forecast simulation distribution divided by \(N_d\). \(N_d\) is defined as the total number of observed values at day d in all 470 forecasts, or in other words, the number of starting dates for which we have data d days into the future:
$$\begin{aligned} accuracy(d) = \frac{\sum _{i}^{N=470} {obsw_{id}}}{N_d} \end{aligned}$$
(24)
where \(obsw = 1\) if the observed value at day d lies within the range (if not, \(obsw = 0\)). The IQR and 95% range of each forecast are defined by their 100 independent model runs. The day d goes from 1…30 with \(d = 1\) being the first forecasted day. Similarly, the precision was defined as the size of the IQR divided by the maximum forecast value at day d averaged over all 470 forecasts with max being the maximum forecasted value from all simulation runs for forecast i at a given day d:
$$\begin{aligned} precision(d) = \frac{\sum _{i}^{N=470} (IQR/max)_{i,d}}{N_d} \end{aligned}$$
(25)
To evaluate the time dependency of the performance of the forecasts we analysed the mean absolute scaled error (MASE)^{22} (see Supplementary Material section S6). Further, we calculated the bias of the forecast to determine if our method results in a systematic over or underestimation (see Supplementary Material Section S6).
Hospitalspecific forecasts
We forecasted the bed demand for four university hospitals in BadenWürttemberg, Germany, located in Freiburg, Mannheim, Heidelberg, and Tübingen. Participants from each hospital provided a list of counties they perceived as their main catchment area, as well as a list of the number of occupied beds in both ICUs and General wards between 14 October 2020 and 27 January 2022. Supplementary Table S3 lists the counties in the specified local catchment areas and their total population. In addition to those hospitalspecific catchments, we produced forecasts based on every German state as an individual catchment, plus the whole of Germany (all states combined). We created bed demand forecasts for each hospital starting at every day between 14102020 and 27012022, resulting in a total of 470 forecasts. For each of the 470 forecasts we performed 100 independent model runs, with a forecasting length of 30 days. The generated forecasts were compared to the actual observed bed occupancy data. We tested the effect of forecasting \(R_e(t)\) on the bed forecast using either the exponential smoothing (ETS) or the naive \(R_e(t)\) forecast, as well as including or excluding the added fitness advantage of a given VoC to the prediction. Additionally, a scenario was tested where the selected catchment and the area corresponding to the observed hospital have a similar population size but do not overlap in their possible admitted patients (e.g. a city which is geographically far away from the hospital of interest). To accomplish this, we chose representative catchments around the city of Rostock, Germany and matched their bed occupancy to the catchments of the four investigated hospitals.