Skip to contents

Load the packages we’ll use here:

library(nestedLogit)    # Nested Dichotomy Logistic Regression Models
library(knitr)          # A General-Purpose Package for Dynamic Report Generation in R
library(dplyr)          # A Grammar of Data Manipulation
library(tidyr)          # Tidy Messy Data
library(ggplot2)        # Create Elegant Data Visualisations Using the Grammar of Graphics
library(geomtextpath)   # Curved Text in 'ggplot2'

The main vignette illustrated the basic plot method, plot.nestedLogit() in the package. However, to explain plotting nested-dichotomies models works, and for better control of the details, it is useful to describe how graphs can be constructed directly directly. We’ll use the same example of women’s labor force participation, using the original dichotomies:

data(Womenlf, package = "carData")
Womenlf$partic <- with(Womenlf,
                       factor(partic, levels = c("not.work", "parttime", "fulltime")))

wlf.nested <- nestedLogit(partic ~ hincome + children,
                          dichotomies = logits(work=dichotomy("not.work", working=c("parttime", "fulltime")),
                                               full=dichotomy("parttime", "fulltime")),
                          data=Womenlf)

Fitted probabilities

To draw a plot, it is sufficient to calculate predicted probabilities over a grid of values of the predictor variables. Here, we select a range of 0 - 45 in steps of 5, combined with the two values of children.

new <- expand.grid(hincome=seq(0, 45, by = 5),
                   children=c("absent", "present"))

pred.nested <- predict(wlf.nested, newdata = new)
plotdata <- cbind(new, pred.nested)
head(plotdata)
#>   hincome children not.work parttime fulltime
#> 1       0   absent   0.2082  0.02372   0.7681
#> 2       5   absent   0.2452  0.03785   0.7169
#> 3      10   absent   0.2864  0.05907   0.6545
#> 4      15   absent   0.3315  0.08936   0.5791
#> 5      20   absent   0.3800  0.12944   0.4906
#> 6      25   absent   0.4309  0.17691   0.3922

Using matplot()

Because the fitted values are in multiple columns, it is easiest to plot these using matplot(). We could plot these all in a single (messy) figure, but it is clearer to show separate panels for children absent and present.

To do this, the plotdata data set is subset within a loop over the values of children and each subset is plotted by matplot. It is only necessary to include the legend in one panel. The plots are combined into a single figure using par(mfrow()).

op <- par(mfrow=c(1,2), mar=c(4,4,3,1)+.1)
cols=c("blue", "magenta", "darkgreen")
for ( kids in c("absent", "present") ) {
  data <- subset(plotdata, children==kids)
  matplot(data[, "hincome"], data[, 5:3], 
          type = "l", lwd=3, lty = 1:3, col = cols,
          xlab="Husband's Income", 
          ylab='Fitted Probability',
          main = paste("Children", kids),
          cex.lab = 1.1)
  if (kids=="absent") {
    legend("topright", lty=1:3, lwd=3, col=cols, bty = "n",
           legend=c("fulltime", "parttime", "not working"))
  }
}
**matplot**: Predicted probabilities of working at all or working part time or full time

matplot: Predicted probabilities of working at all or working part time or full time

par(op)

Using ggplot()

More control, and perhaps a more aesthetically pleasing figure can be produced using ggplot (Wickham et al., 2023). However, ggplot wants the data in long format. That makes it easy to plot probability against one predictor and use color to distinguish the levels of partic and facet the plot by children. (The result of tidyr::pivot_longer() doesn’t recognize an ordered nature of "Working", so this is done in a separate step.)

plotlong <- plotdata |>
  tidyr::pivot_longer(fulltime : not.work,
                      names_to = "Working",
                      values_to = "Probability") |>
  mutate(Working = ordered(Working, 
                           levels = c("not.work", "parttime", "fulltime")) )

head(plotlong)
#> # A tibble: 6 × 4
#>   hincome children Working  Probability
#>     <dbl> <fct>    <ord>          <dbl>
#> 1       0 absent   fulltime      0.768 
#> 2       0 absent   parttime      0.0237
#> 3       0 absent   not.work      0.208 
#> 4       5 absent   fulltime      0.717 
#> 5       5 absent   parttime      0.0378
#> 6       5 absent   not.work      0.245

Then, one call to ggplot produces both panels. To sort the levels of Working appropriately, we made this an ordered factor in the step above, but assign the colors with a discrete scale.

gg <- ggplot(plotlong,
             aes(x=hincome, y=Probability, color=Working)) +
  geom_line(linewidth = 2) +
  scale_color_discrete() +
  labs(x="Husband's Income", y= "Probability") +
  facet_wrap(~ children, labeller = label_both) +
  theme_bw(base_size = 14) +
  theme(legend.position = c(.3, .8))
gg
**ggplot**: Predicted probabilities of working at all or working part time or full time

ggplot: Predicted probabilities of working at all or working part time or full time

Direct labels

It’s usually nicer to label the curves directly than to rely on a legend. The geomtextpath (R-geomtextpath?) package offers geom_textline() as an alternative to geom_line() that adds a text label to a curve. hjust and vjust position the labels along the curve.

ggplot(plotlong,
       aes(x=hincome, y=Probability, color=Working)) +
  geom_textline(aes(label = Working),
                linewidth = 2, size = 5, 
                hjust = 0.9, vjust = 0.2) +
  scale_color_discrete() +
  labs(x = "Husband's Income", y = "Probability") +
  facet_wrap(~ children, labeller = label_both) +
  theme_bw(base_size = 14) +
  theme(legend.position = "none")
**geomtextpath**: Predicted probabilities, with labels on the curves

geomtextpath: Predicted probabilities, with labels on the curves

Plotting fitted log odds

It is sometimes easier to interpret logistic regression models by plotting the linear predictors on the log odds scale, because these appear as straight lines. That is, from the estimated coefficients in the model, the predicted log odds of work and full are given by

\[ L_1 =\log\left[ \frac { \widehat{P( \operatorname{work} = \operatorname{1} )} }{ 1 - \widehat{P( \operatorname{work} = \operatorname{1} )} } \right] = 1.34 - 0.04(\operatorname{hincome}) - 1.58(\operatorname{children}_{\operatorname{present}}) \]

\[ L_2 =\log\left[ \frac { \widehat{P( \operatorname{full} = \operatorname{1})} }{ 1 - \widehat{P( \operatorname{full} = \operatorname{1})} } \right] = 3.48 - 0.11(\operatorname{hincome}) - 2.65(\operatorname{children}_{\operatorname{present}}) \] Here, \(\operatorname{children}_{\operatorname{present}}\) is either 0 or 1, so this amounts to a shift in the intercept when children are present.

These values can be calculated by calling predict.glm() directly for the models component of wlf.nested, specifying type = "link":

pred.logits <- sapply(models(wlf.nested), predict, newdata=new, type = "link")
plotdatal <- cbind(new, pred.logits)
head(plotdatal)
#>   hincome children   work   full
#> 1       0   absent 1.3358 3.4778
#> 2       5   absent 1.1243 2.9414
#> 3      10   absent 0.9127 2.4051
#> 4      15   absent 0.7012 1.8688
#> 5      20   absent 0.4897 1.3324
#> 6      25   absent 0.2781 0.7961

Then we plot these more or less as before:

cols <- c("blue", "red")

op <- par(mfrow=c(1,2), mar=c(4,4,3,1)+.1)
for ( kids in c("absent", "present") ) {
  data <- subset(plotdatal, children==kids)
  matplot(data[, "hincome"], data[, 3:4],
          type = "l", lwd=3, lty = 1, col = cols,
          xlab="Husband's Income",
          ylab='Predicted Log Odds',
          main = paste("Children", kids),
          cex.lab = 1.1)
  if (kids=="absent") {
    legend("topright", lty=1, lwd=3, col=cols, bty = "n",
           title = "Dichotomy",
           legend=c("work", "full"))
  }
}
**log odds**: Predicted logits of the  `work` and `full` dichotomies, by `children`

log odds: Predicted logits of the work and full dichotomies, by children

par(op)

Finally, we can use ggplot as before to plot the fitted logits, first transforming the long format.

plotlongl <- plotdatal |>
  tidyr::pivot_longer(work : full,
                      names_to = "Dichotomy",
                      values_to = "logit") |>
  mutate(Dichotomy = ordered(Dichotomy,
                         levels = c("work", "full")) )

Doing this gives the flexibility that we can facet the plot either by children (as before) or Dichotomy. The latter nicely shows that the additive model has equal slopes for husband’s income within both panels: increasing husband’s income decreases the log odds of working, but at a faster rate for the full dichotomy. Having young children decreases the log odds of working either at all or working fulltime as compared to parttime.

ggplot(plotlongl,
       aes(x=hincome, y=logit, color=children)) +
  geom_line(linewidth = 3) +
  geom_point(size = 1.5, shape = 16, color = "black") +
  scale_color_discrete() +
  labs(x="Husband's Income", y= "Log Odds") +
  facet_wrap(~ Dichotomy, labeller = label_both) +
  theme_bw(base_size = 14) +
  theme(legend.position = c(.35, .82))
**log odds**: Predicted logits of the  `work` and `full` dichotomies, by `Dichotomy`

log odds: Predicted logits of the work and full dichotomies, by Dichotomy

References

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., … Dunnington, D. (2023). ggplot2: Create elegant data visualisations using the grammar of graphics. Retrieved from https://CRAN.R-project.org/package=ggplot2