Fitting a DLNM is simple without the {dlnm} package

Wait, then what does {dlnm} do?

DLNMs
GAMs
r
Published

February 8, 2021

Note

This is part of series about distributed lag non-linear models. Please read the first post for an introduction and a disclaimer.

Since I’ve been working so much with GAMs for this project, I decided to read sections of Simon Wood’s book, Generalized additive models: an introduction with R more thoroughly. In Chapter 5: Smoothers, there is an example tensor product smooth (more on what that means later) fitting a distributed lag model. When I saw this, I started to question everything. What was the dlnm package even doing if I could fit a DLNM with just mgcv? Was it doing something wrong? Was I interpreting it wrong? Am I going to have to change EVERYTHING?

I also found this wonderful paper by Nothdurft and Vospernik that fits a DLNM for yearly tree ring data explained by lagged, non-linear, monthly weather data. They also used only the mgcv package and tensor product smooths to fit this model. So what is a tensor product and what is the dlnm package doing differently?

Tensor Products

A tensor product smooth is a two (or more) dimensional smooth such that the shape of one dimension varies smoothly over the other dimension. Tensor products are constructed from two (or more) so-called marginal smooths. Imagine a basket weave of wood strips. The strips can be different widths and be more or less flexible in each dimension. The flexibility of the strips roughly corresponds to a smoothing penalty (stiffer = smoother) and the number and width of strips roughly corresponds to number of knots. You can bend the first strip on one dimension, but you can’t really bend the adjacent strip in the completely opposite direction. The shape of the strips in one dimension is forced to vary smoothly across the other dimension.

basketweave pattern

The tensor product for a DLNM has the environmental predictor on one dimension (SPEI, in our example) and lag time on the other dimension. So SPEI can have a non-linear effect on plant growth, but the shape of that relationship with lag = 0 is constrained to be similar to the shape at lag = 1 (the adjacent strip of wood). The change in the shape of the SPEI effect varies smoothly with lag time. Here’s a pure mgcv implementation of a DLNM.

library(mgcv)
Loading required package: nlme

Attaching package: 'nlme'
The following object is masked from 'package:dplyr':

    collapse
This is mgcv 1.8-42. For overview type 'help("mgcv-package")'.
growth_te <-
  gam(log_size_next ~ 
        s(log_size) +
        te(spei_history, L,
          bs = "cr", 
          k = c(5, 15)),
      family = gaussian(link = "identity"),
      method = "REML",
      select = TRUE,
      data = ha)

So what the heck is {dlnm} doing that’s different?

After re-reading Gasparrini’s papers for the billionth time and reading more of Simon Wood’s book, I realized the difference between the pure mgcv approach and the dlnm approach had to do with “identifiability constraints”. Basically, because a GAM is a function that has covariates which themselves are functions, those smooth covariates are usually constrained to sum to 0. For example, the smooth term for s(log_size) looks essentially centered around 0 on the y-axis when plotted.

library(gratia)
draw(growth_te, select = 1)

Tensor product smooths in mgcv have this constraint as well, but not for each marginal function. The entire surface sums to zero.

draw(growth_te, select = 2, dist = Inf)

The dlnm package constructs a “crossbasis” function, but it does this by using tensor products from mgcv. So what is the difference? Well, the major difference is in how it does identifiability constraints. For te(..), the entire surface must sum to zero. For s(..., bs = "cb"), the predictor-response dimension is constrained to sum to zero. That means that every slice for any value of lag must sum to zero. It also removes the the intercept from that dimension, so the resulting smooth ends up having fewer knots and fewer maximum degrees of freedom.

Here’s the dlnm version:

library(dlnm)
This is dlnm 2.4.7. For details: help(dlnm) and vignette('dlnmOverview').
growth_cb <-
  gam(log_size_next ~ 
        s(log_size) +
        s(spei_history, L, #crossbasis function
          bs = "cb", 
          k = c(5, 15), 
          xt = list(bs = "cr")),
      family = gaussian(link = "identity"),
      method = "REML",
      select = TRUE,
      data = ha)
draw(growth_cb, select = 2, dist = Inf)

Notice how it looks much more symmetrical left to right. This is more clear if we plot all the slices through lag time on top of eachother, kind of like holding the surface up with the x-axis at eye-level and looking down the y-axis in the middle.

eval_cb <- evaluate_smooth(growth_cb, "spei_history", dist = Inf)
Warning: `evaluate_smooth()` was deprecated in gratia 0.7.0.
ℹ Please use `smooth_estimates()` instead.
eval_te <- evaluate_smooth(growth_te, "spei_history", dist = Inf)
eval_cb %>% 
  #just take beginning and end
  # filter(L == min(L) | L == max(L)) %>% 
  mutate(L = as.factor(L)) %>% 
  ggplot(aes(x = spei_history, y = est, group = L)) + 
  geom_line(alpha = 0.1) +
  labs(title = "crossbasis from {dlnm}") +
  coord_cartesian(xlim = c(-0.5,0.5), ylim = c(-0.02, 0.02)) +

eval_te %>% 
  #just take beginning and end
  # filter(L == min(L) | L == max(L)) %>% 
  mutate(L = as.factor(L)) %>% 
  ggplot(aes(x = spei_history, y = est, group = L)) +
  geom_line(alpha = 0.1) +
  labs(title = "tensor product from {mgcv}") +
  coord_cartesian(xlim = c(-0.5,0.5), ylim = c(-0.02, 0.02))

The crossbasis function intercepts are more tightly aligned, I think because of the sum-to-zero constraint along the spei_history axis.

Oddly, the slices in the dlnm version still don’t all sum to zero, so maybe I’m still not totally explaining this right.

What does it all mean?

Ultimately, there is little difference between these approaches for these data. If we use the cb_margeff() function I mentioned earlier on in this series to get fitted values of y (for the whole GAM, not just the smooth), then the two models look nearly identical.

pred_cb <- cb_margeff(growth_cb, spei_history, L)
pred_te <- cb_margeff(growth_te, spei_history, L)
ggplot(pred_cb, aes(x = x, y = lag, fill = fitted)) + 
  geom_raster() +
  geom_contour(aes(z = fitted), binwidth = 0.01, color = "black", alpha = 0.3) +
  scale_fill_viridis_c(option = "plasma") +
  labs(title = "{dlnm} crossbasis") +

ggplot(pred_te, aes(x = x, y = lag, fill = fitted)) + 
  geom_raster() +
  geom_contour(aes(z = fitted), binwidth = 0.01, color = "black", alpha = 0.3) +
  scale_fill_viridis_c(option = "plasma") +
  labs(title = "{mgcv} tensor product")
Warning: The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?
The following aesthetics were dropped during statistical transformation: fill
ℹ This can happen when ggplot fails to infer the correct grouping structure in
  the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
  variable into a factor?

And the summary statistics are very similar as well

anova(growth_cb)

Family: gaussian 
Link function: identity 

Formula:
log_size_next ~ s(log_size) + s(spei_history, L, bs = "cb", k = c(5, 
    15), xt = list(bs = "cr"))

Approximate significance of smooth terms:
                     edf Ref.df       F  p-value
s(log_size)        1.363  9.000 261.921  < 2e-16
s(spei_history,L)  4.570 23.000   1.077 4.01e-05
anova(growth_te)

Family: gaussian 
Link function: identity 

Formula:
log_size_next ~ s(log_size) + te(spei_history, L, bs = "cr", 
    k = c(5, 15))

Approximate significance of smooth terms:
                      edf Ref.df       F  p-value
s(log_size)         1.364  9.000 261.905  < 2e-16
te(spei_history,L)  4.618 26.000   0.955 4.02e-05

The major difference, you’ll notice, is that the reference degrees of freedom are larger for the tensor product version. Again, that is because the crossbasis function constrains the SPEI dimension to sum to zero and the intercept along that dimension is removed.

Why {dlnm}?

So the advantages of the dlnm package for fitting DLNMs are probably mostly evident when you consider cases besides GAMs. For example, If I wanted to constrain the lag dimension to be a switchpoint function, or if I wanted the SPEI dimension to be strictly a quadratic function, I could do that with dlnm. If you’re interested in interpreting your results in terms of relative risk ratios, then dlnm offers some OK visualiztion options for that. When using smooth functions for both marginal bases, the differences between using a straight tensor product with te() and using a crossbasis function start to fade away. The dlnm version is still a little easier to interpret, I think, because you can more easily compare slices through lag time with the plot of the smooth itself.