---
title: "Introduction to ggpointless"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to ggpointless}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "ragg_png" # <3
)
library(ggpointless)
```

`ggpointless` is a small extension of the [`ggplot2`](https://ggplot2.tidyverse.org/) package that provides additional layers:

1. `geom_pointless()` & `stat_pointless()`
2. `geom_lexis()` & `stat_lexis()`
3. `geom_chaikin()` & `stat_chaikin()`
4. `geom_catenary()` & `stat_catenary()`


## geom_pointless 
`geom_pointless()` is a layer to easily add minimal emphasis to your plots. The function takes it's power from `stat_pointless()`, which does all the work, but is not usually in the spotlight.

```{r default_pointless}
library(ggplot2)
library(ggpointless)

x <- seq(-pi, pi, length.out = 100)
y <- outer(x, 1:5, function(x, y) sin(x * y))

df1 <- data.frame(
  var1 = x,
  var2 = rowSums(y)
)

p <- ggplot(df1, aes(x = var1, y = var2))
p + geom_pointless(location = c("first", "last", "minimum", "maximum"))
```

As you see, just adding `geom_pointless()` to `ggplot(...)` is not terribly useful on its own but when it teams up with `geom_line()` and friends, hopefully.


```{r location}
p <- p + geom_line()
p + geom_pointless(location = "all", size = 3)
```


`geom_pointless()` behaves like `geom_point()` does with the addition of a `location` argument. You can set it to `"first"`, `"last"` (the default), `"minimum"`, `"maximum"`, and `"all"`; where `"all"` is just shorthand to select `"first"`, `"last"`, `"minimum"` and `"maximum"`.  

In addition, you can use the computed variable `location` and map it to an aesthetic, e.g. `color`.  

```{r color}
p + geom_pointless(aes(color = after_stat(location)),
  location = "all",
  size = 3
) +
  theme(legend.position = "bottom")
```

### Order and orientation

The locations are determined in the order in which they appear in the data, like `geom_path()` does compared to `geom_line()`. This can be seen in the next example, with sample data kindly taken from the [`geomtextpath`](https://github.com/AllanCameron/geomtextpath/#using-geomtextpath) package:

```{r spiral, fig.show='hold'}
x <- seq(5, -1, length.out = 1000) * pi
spiral <- data.frame(
  var1 = sin(x) * 1:1000,
  var2 = cos(x) * 1:1000
)

p <- ggplot(spiral) +
  geom_path() +
  coord_equal(xlim = c(-1000, 1000), ylim = c(-1000, 1000)) +
  theme(legend.position = "none")

p + aes(x = var1, y = var2) +
  geom_pointless(aes(color = after_stat(location)),
    location = "all",
    size = 3
  ) +
  labs(subtitle = "orientation = 'x'")

p + aes(y = var1, x = var2) +
  geom_pointless(aes(color = after_stat(location)),
    location = "all",
    size = 3
  ) +
  labs(subtitle = "orientation = 'y'")
```

As you see from the first of the last two examples `"first"` and `"minimum"` overlap, and `"first"` wins over `"minimum"`. If `location` is set to `"all"`, then the order in which points are plotted from top to bottom is: `"first"` > `"last"` > `"minimum"` > `"maximum"`. 

Otherwise, the order is determined as specified in the `location` argument, which also applies to the order of the legend key labels.

```{r overplotting}
cols <- c(
  "first" = "#f8766d",
  "last" = "#7cae00",
  "minimum" = "#00bfc4",
  "maximum" = "#c77cff"
)

df2 <- data.frame(
  var1 = 1:2,
  var2 = 1:2
)

p <- ggplot(df2, aes(x = var1, y = var2)) +
  geom_path() +
  coord_equal() +
  scale_color_manual(values = cols)

# same as location = 'all'
p + geom_pointless(aes(color = after_stat(location)),
  location = c("first", "last", "minimum", "maximum"),
  size = 3
) +
  labs(subtitle = "same as location = 'all'")
```

```{r}
# reversed order
p + geom_pointless(aes(color = after_stat(location)),
  location = c("maximum", "minimum", "last", "first"),
  size = 3
) +
  labs(subtitle = "custom order")
```

```{r}
# same as location = 'all' again
p + geom_pointless(aes(color = after_stat(location)),
  location = c("maximum", "minimum", "last", "first", "all"),
  size = 3
) +
  labs(subtitle = "same as location = 'all' again")
```


### Pick a different geom

Just like all `stat_*` functions, `stat_pointless()` has a default geom, which is `"point"`. This means in reverse that you can highlight e.g. minimum and maximum in another way, for example with a horizontal line.

```{r stat}
set.seed(42)
ggplot(data.frame(x = 1:10, y = sample(1:10)), aes(x, y)) +
  geom_line() +
  stat_pointless(
    aes(yintercept = y, color = after_stat(location)),
    location = c("minimum", "maximum"),
    geom = "hline"
  ) +
  guides(color = guide_legend(reverse = TRUE))
```

## Plotting lifelines with geom_lexis

`geom_lexis()` draws a lifeline for an event from it's start to it's end. The required aesthetics are `x` and `xend`. Here is an example:

```{r default_lexis}
df1 <- data.frame(
  key = c("A", "B", "B", "C", "D"),
  x = c(0, 1, 6, 5, 6),
  y = c(5, 4, 10, 8, 10)
)

p <- ggplot(df1, aes(x = x, xend = y, color = key)) +
  coord_equal()
p + geom_lexis()
```

Also, if there is a gap in an event a horizontal line is drawn, which can be hidden setting `gap_filler = FALSE`.

```{r gap_filler, fig.show='hold'}
p + geom_lexis(gap_filler = FALSE)
```

You can further style the appearance of your plot using the additional arguments. If you e.g. want to make a visual distinction between the ascending lines and the connecting lines, use `after_stat()` to map the `type` variable to the linetype aesthetic (or any other aesthetic). The variable `type` is created by `geom_lexis()` and takes two values: "solid" and "dotted"; so you might also want to call `scale_linettype_identity`.


```{r after_stat}
p + geom_lexis(
  aes(linetype = after_stat(type)),
  point_show = FALSE
) +
  scale_linetype_identity()
```

### Date and POSIXct classes
You see the coordinates on the vertical y-axis show the difference between `x` and `xend` aesthetics. The "magic" of `geom_lexis()` happens in `stat_lexis()` when the input data is transformed and the calculations are performed. 

```{r numeric}

df1 <- data.frame(
  start = c(2019, 2021),
  end = c(2022, 2022),
  key = c("A", "B")
)

ggplot(df1, aes(x = start, xend = end, group = key)) +
  geom_lexis() +
  coord_fixed()
```


Keeping in mind that dates are internally represented as the number of days, and the POSIXct class in turn represents seconds since some origin, the y-scale values in the next plots should come as no surprise.


```{r dates-and-times, fig.show='hold'}
# Date
fun <- function(i, class) as.Date(paste0(i, "-01-01"))
df1[, c("start", "end")] <- lapply(df1[, c("start", "end")], fun)
p1 <- ggplot(df1, aes(x = start, xend = end, group = key)) +
  geom_lexis() +
  labs(y = "days") +
  coord_fixed()

# POSIXct
df2 <- df1
df2[, c("start", "end")] <- lapply(df2[, c("start", "end")], as.POSIXct)
p2 <- ggplot(df2, aes(x = start, xend = end, group = key)) +
  geom_lexis() +
  labs(y = "seconds") +
  coord_fixed()

p1
p2
```

In order to change the breaks and labels of the vertical scale to, say, years, we make the assumption that 1 year has 365.25 days, or 365.25 * 86400 seconds.

```{r transformation, fig.show='hold'}
# years, roughly
p1 +
  scale_y_continuous(
    breaks = 0:3 * 365.25, # or for p2: 0:3*365.25*86400
    labels = function(i) floor(i / 365.25) # floor(i / 365.25*86400)
  ) +
  labs(y = "years")
```


## geom_chaikin
The algorithm of the `geom_chaikin()` function iteratively cuts off the ragged corners, as you can see in the example below.

```{r geom-chaikin-intro, fig.show='hold'}
set.seed(42)
dat <- data.frame(
  x = seq.int(10),
  y = sample(15:30, 10)
)

p1 <- ggplot(dat, aes(x, y)) +
  geom_line(linetype = "12")

p1 +
  geom_chaikin()
```

## geom_catenary
A catenary is the curve of an hanging chain or cable under its own weight when supported only at its ends (see [Wikipedia](https://en.wikipedia.org/wiki/Catenary) for more information). 

```{r geom-catenary-intro, fig.show='hold'}
ggplot(data.frame(x = c(0, 1), y = c(1, 1)), aes(x, y)) +
  geom_catenary() +
  ylim(0, 1)
```

`geom_catenary()` calculates a default value for the length of the chain that is set
to 2 times the Euclidean distance of all x/y coordinates. Use the `chainLength
argument to overwrite the default.

```{r geom-catenary-chainLength, fig.show='hold'}
ggplot(data.frame(x = c(0, 1), y = c(1, 1)), aes(x, y)) +
  geom_catenary(chainLength = 1.5)  +
  ylim(0, 1)
```

Multiple x/y coordinates are supported too.

```{r geom-catenary-chainLength-two-obs, fig.show='hold'}
ggplot(data.frame(x = c(0, 1, 4), y = c(1, 1, 1)), aes(x, y)) +
  geom_catenary()
```

If you set `chainLength` to a value that is too short to connect two points, a straight line is drawn and a message is shown.

```{r geom-catenary-chainLength-minimum, fig.show='hold'}
ggplot(data.frame(x = c(0, 1, 4), y = c(1, 1, 1)), aes(x, y)) +
  geom_catenary(chainLength = 4)
```

If you want to draw a chain where each piece has a different `chainLength` 
value, remember you can add a list to a ggplot object:

```{r geom-catenary-chainLength-individual, fig.show='hold'}
ggplot(data.frame(x = c(0, 1), y = 1),
       aes(x, y)) + 
  lapply(2:10, function(chainLength) {
    geom_catenary(chainLength = chainLength)
  }
)
```
 
 
## Data

The following data sets are shipped with the `ggpointless` package:

1. `co2_ml` : [CO~2~ records taken at Mauna Loa](https://gml.noaa.gov/ccgg/trends/data.html)
2. `covid_vac` : [COVID-19 Cases and Deaths by Vaccination Status](https://covid.cdc.gov/covid-data-tracker/#rates-by-vaccine-status)
3. `female_leaders` : [Elected and appointed female heads of state and government](https://en.wikipedia.org/w/index.php?title=List_of_elected_and_appointed_female_heads_of_state_and_government&oldid=1078024588)

See the [`vignette("examples")`](https://flrd.github.io/ggpointless/articles/examples.html) for possible use cases.
