This is part of series about distributed lag non-linear models. Please read the first post for an introduction and a disclaimer.
Choosing marginal function to construct a crossbasis
According to Gasparrini et al. (2017), a crossbasis function is a “bi-dimensional dose-lag-response function \(f \cdot w(x,l)\) is composed of two marginal functions: the standard dose-response function \(f(x)\), and the additional lag-response function \(w(l)\) that models the lag structure…” Each dimension can be described by a different type of function. The default for the dlnm package is a type of smoother called a P-spline, but it can be changed to other types of splines or even something like step function. The marginal functions can also be mixed and matched, e.g., a P-spline for the lag dimension and a step function for the dose-response dimension.
I’d like to use penalized splines for both bases since they are flexible—that is, they can take nearly any functional shape, including a perfectly straight line.
So far I’ve been using penalized cubic regression splines for both the lag and dose-response dimensions of my DLNMs, but to be perfectly honest, I think I’m only doing this because Teller et al. (2016) use a similar spline basis, However, they aren’t even using DLNMs! I should at least be able to justify my choice of basis function.
library(mgcv) #for gam()
Loading required package: nlme
Attaching package: 'nlme'
The following object is masked from 'package:dplyr':
collapse
This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
library(dlnm) #for the "cb" basis
This is dlnm 2.4.7. For details: help(dlnm) and vignette('dlnmOverview').
#with cubic regression splines for both dimensionsgrowth_cr <-gam(log_size_next ~ log_size +s(spei_history, L, # <- the two dimensionsbs ="cb", # <- fit as crossbasisk =c(4, 24), # <- knots for each dimensionxt =list(bs ="cr")), # <- what basis to use for each dimensionfamily =gaussian(link ="identity"),method ="REML",data = ha)
Note: for P-splines, the number of knots, k, must be 2 greater than order of the basis (default 2, i.e. cubic), so I’m using the minimum (4) for the dose-response dimension.
#with default P-splines for both dimensionsgrowth_ps <-gam(log_size_next ~ log_size +s(spei_history, L, # <- the two dimensionsbs ="cb", # <- fit as crossbasisk =c(4, 24)), # <- knots for each dimensionfamily =gaussian(link ="identity"),method ="REML",data = ha)
growth_cr
Family: gaussian
Link function: identity
Formula:
log_size_next ~ log_size + s(spei_history, L, bs = "cb", k = c(4,
24), xt = list(bs = "cr"))
Estimated degrees of freedom:
8.37 total = 10.37
REML score: 675.5565
growth_ps
Family: gaussian
Link function: identity
Formula:
log_size_next ~ log_size + s(spei_history, L, bs = "cb", k = c(4,
24))
Estimated degrees of freedom:
7.63 total = 9.63
REML score: 673.1247
The REML score is slightly higher for the "cr" basis, which I think means a better fit to data (I think this score is what is being maximized by the model fitting algorithm).
The minima and maxima are in the same places, which is very reassuring. The wiggliness is different, which is also indicated by the estimated degrees of freedom (8.37 for the “cs” model and 7.63 for the “ps” model).
Final Decision
I’m going to stick with the cubic regression spline basis (bs = "cr") because it seems to result in a slightly better fit to data than the P-spline smoothers. In addition, Simon Wood says “However, in regular use, splines with derivative based penalties (e.g.”tp” or “cr” bases) tend to result in slightly better MSE performance” (see ?smooth.construct.ps.smooth.spec).