Modeling the daily counts is a natural choice and has been considered in many studies. Nonetheless, we noticed the pattern between cumulative counts is much cleaner. Consider, for example, Fig. 1 for the scatter plots for Ontario data between 2021-10-31 and 2022-02-10. As shown in the bottom panel in Fig. 1, there is a very neat pattern between cumulative counts. But there is no such clean pattern in the daily counts plot (top panel in Fig. 1). The noise level seems to be fairly high so even a time-varying coefficient model may not lead to a good fit. It turns out the signal may get strengthened by balancing out the noise in the daily counts when adding them up. Therefore, we consider *cumulative* counts in our modeling but utilize daily counts for model assessment/comparison. After all, the future trend of daily counts is of our primary interest. We remark that the fitted/predicted values in cumulative counts can be easily converted to fitted/predicted values in daily new counts by taking the difference between any two consecutive cumulative counts.

In the following, we will present two modeling strategies to capture the lagged dynamic relationship, that is, local polynomial regression and piecewise linear regression.

### Local polynomial regression with lagged dependence

Different from the classical regression models, the time varying coefficient regression model allows the regression coefficients \(\beta\)’s to change over some other covariate (called smoothing variable). Suppose the data are in form of \((Y_i, X_i)_{i=1}^n\). The time-varying coefficient regression model is

$$\begin{aligned} Y_{i+L}= & {} \beta _{i,0} + \beta _{i,1} X_{i} + \epsilon _i, \quad i=1,\ldots , n – L, \end{aligned}$$

(1)

where \(\beta _{i,0}=\beta _0(\dfrac{i}{n})\) and \(\beta _{i, 1}=\beta _1(\dfrac{i}{n})\) for some smooth function \({\varvec{\beta }}=(\beta _0,\beta _1)^\prime : [0, 1]\rightarrow {\mathbb {R}}^2\). We assume \(E(\epsilon _i|X_i)=0\).

To reflect the lagged dependence between the death count time series and other time series (e.g., case count time series), the outcome variable (*Y*) in the above model is for time \(i+L\) conditional on the value of the predictor (*X*) at time *i*. The optimal choice of *L* is selected based on the criterion of minimizing mean squared prediction errors for out-of-sample daily counts predictions. More details will be given in “Selection of lag” section.

The local constant (also called the Nadaraya-Watson method) and local linear estimation methods are the commonly used local polynomial methods for time-varying coefficient regression models. The local constant estimates are obtained by minimizing the following objective function, that is,

$$\begin{aligned} {\hat{{\varvec{\beta }}}}(t) = \text{ argmin}_{{\varvec{\theta }}}\sum _{i=1}^n\left( y_i-\textbf{x}^\prime _i{\varvec{\theta }}\right) ^2K_b(i/n-t), t\in [0,1], \end{aligned}$$

where \(K_b(\cdot )={\displaystyle {1\over b}}K(\cdot /b)\) is the kernel and *b* is a bandwidth, and \(\textbf{x}_i=(1, x_{i})^\prime\). Obviously the resulting estimator depends on the choice of bandwidth *b*. As stated in^{17}, the bandwidth is selected by cross validation (leave-one-out cross-validation by default in tvReg). The triweight kernel is the default choice in tvLM, an R function in the R package tvReg^{17}.

The local constant estimator can be written in the following matrix form that resembles the weighted least squares estimator. Let

$$\begin{aligned} \textbf{y} & =(y_1,\ldots ,y_n)^\prime ,\\ \textbf{X} & = (\textbf{x}_1, \textbf{x}_2,\ldots , \textbf{x}_n)^\prime ,\\ W_t & = \text{ diag }\left( K_b(1/n-t), K_b(2/n-t), \ldots , K_b(i/n-t), \ldots , K_b(1-t)\right) . \end{aligned}$$

Then we have the local constant estimator, denoted by \({\hat{{\varvec{\beta }}}}\),

$$\begin{aligned} {\hat{{\varvec{\beta }}}}(t)= & {} (\textbf{X}^\prime W_t \textbf{X})^{-1}\textbf{X}^\prime W_t \textbf{y}. \end{aligned}$$

(2)

Similarly, local linear estimators can be obtained by minimizing

$$\begin{aligned} ({\hat{{\varvec{\beta }}}}(t),{\hat{{\varvec{\beta }}}}^{(1)}(t)) = \text{ argmin}_{{\varvec{\theta }}_0,{\varvec{\theta }}_1}\sum _{i=1}^n\left( y_i-\textbf{x}^\prime _i{\varvec{\theta }}_0 – \textbf{x}^\prime _i{\varvec{\theta }}_1(i/n-t)\right) ^2K_b(i/n-t), \end{aligned}$$

where \({\varvec{\beta }}^{(1)}(t)\) is the first order derivative of \({\varvec{\beta }}(t)\).

Let \(U_t=\hbox {diag}\{1/n-t,\ldots , i/n-t,\ldots , 1-t\}\) and \(\Gamma _t=(\textbf{X}, U_t\textbf{X})\). Therefore, the local linear estimator can be expressed in the following matrix form

$$\begin{aligned} ({{\hat{{\varvec{\beta }}}}^\prime (t)}, {{\hat{{\varvec{\beta }}}}^{(1)\prime }(t)})^\prime= & {} (\Gamma _t^\prime W_t \Gamma _t)^{-1}\textbf{X}^\prime W_t \textbf{y}. \end{aligned}$$

(3)

### Piecewise linear regression model with lagged dependence

As explained in “Introduction” section, the smoothness imposed for the time-varying coefficients may not be able to capture some abrupt changes. Therefore, we consider piecewise linear regression models as an alternative.

$$\begin{aligned} Y_{i+L}= & {} \beta _0 + \beta _1x_i + \beta _2 (x_i – k_1)_{+} + \ldots + \beta _{p+1}(x_i-k_p)_{+} + \epsilon _i,\quad i=1,\dots , n – L, \end{aligned}$$

(4)

where

$$\begin{aligned} (x_i-k)_{+}=\left\{ \begin{array}{c c} x_i-k, &{}\;\text{ if } x_i-k\ge 0\\ 0,&{}\;\text{ if } x_i-k<0. \end{array} \right. \end{aligned}$$

with *k* defined as a breakpoint. We also assume \(E(\epsilon _i|X_i=x_i)=0\).

In our data analysis, the R package “segmented”^{18} is used to implement the model estimation for piecewise linear regression models. The restarting bootstrap method^{19} is implemented in “segmented” to handle spurious local minima (e.g., flat segments). The segmented package also provides an automatic option^{20} for determining the number and location of the breakpoints. We found the automatic option over-estimated the number of breakpoints, which led to worse performance for out-of-sample prediction. Therefore, we don’t recommend the use of the automatic option because out-of-sample prediction is of our primary interest. In the following data analysis with the piecewise linear regression method, we selected the starting values of breakpoints by examining the scatter plot.