---
title: "Using edge list generating functions and dyad_id"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using edge list generating functions and dyad_id}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  echo = TRUE,
  comment = "#>"
)
```

`spatsoc` can be used in social network analysis to generate edge lists from GPS relocation data. 


Edge lists are generated using either the `edge_dist` or the `edge_nn` function. 


**Note**: The grouping functions and their application in social network analysis are further described in the vignette [Using spatsoc in social network analysis - grouping functions](https://docs.ropensci.org/spatsoc/articles/using-in-sna.html). 


## Generate edge lists
spatsoc provides users with one temporal (`group_times`) and two edge list generating functions (`edge_dist`, `edge_nn`) to generate edge lists from GPS relocations. Users can consider edges defined by either the spatial proximity between individuals (with `edge_dist`), by nearest neighbour (with `edge_nn`) or by nearest neighbour with a maximum distance (with `edge_nn`). The edge lists can be used directly by the animal social network package `asnipe` to generate networks. 

### 1. Load packages and prepare data
`spatsoc` expects a `data.table` for all `DT` arguments and date time columns to be formatted `POSIXct`. 

```{r, eval = TRUE}
## Load packages
library(spatsoc)
library(data.table)
```

```{r, echo = FALSE, eval = TRUE}
data.table::setDTthreads(1)
```

```{r, eval = TRUE}
## Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))

## Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
```



Next, we will group relocations temporally with `group_times` and generate edges lists with one of `edge_dist`, `edge_dist`. Note: these are mutually exclusive, only select one edge list generating function at a time. 

### 2. a) `edge_dist` 

Distance based edge lists where relocations in each timegroup are considered edges if they are within the spatial distance defined by the user with the `threshold` argument. Depending on species and study system, relevant temporal and spatial distance thresholds are used. In this case, relocations within 5 minutes and 50 meters are considered edges. 

This is the non-chain rule implementation similar to `group_pts`. Edges are defined by the distance threshold and NAs are returned for individuals within each timegroup if they are not within the threshold distance of any other individual (if `fillNA` is TRUE). 

Optionally, `edge_dist` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`. 

```{r, eval = TRUE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')

# Edge list generation
edges <- edge_dist(
  DT,
  threshold = 100,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup',
  returnDist = TRUE,
  fillNA = TRUE
)
```

### 2. b) `edge_nn`

Nearest neighbour based edge lists where each individual is connected to their nearest neighbour. `edge_nn` can be used to generate edge lists defined either by nearest neighbour or nearest neighbour with a maximum distance. As with grouping functions and `edge_dist`, temporal and spatial threshold depend on  species and study system. 

NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and its nearest neighbour is greater than the threshold. 

Optionally, `edge_nn` can return the distances between individuals (less than the threshold) in a column named 'distance' with argument `returnDist = TRUE`. 

```{r, eval = FALSE}
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')

# Edge list generation
edges <- edge_nn(
  DT,
  id = 'ID',
  coords = c('X', 'Y'),
  timegroup = 'timegroup'
)

# Edge list generation using maximum distance threshold
edges <- edge_nn(
  DT, 
  id = 'ID', 
  coords = c('X', 'Y'),
  timegroup = 'timegroup', 
  threshold = 100
)

# Edge list generation using maximum distance threshold, returning distances
edges <- edge_nn(
  DT, 
  id = 'ID', 
  coords = c('X', 'Y'),
  timegroup = 'timegroup', 
  threshold = 100,
  returnDist = TRUE
)

```


## Dyads

### 3. `dyad_id`

The function `dyad_id` can be used to generate a unique, undirected dyad identifier for edge lists. 

```{r, eval = TRUE}
# In this case, using the edges generated in 2. a) edge_dist
dyad_id(edges, id1 = 'ID1', id2 = 'ID2')
```


Once we have generated dyad ids, we can measure consecutive relocations, start and end relocation, etc. **Note:** since the edges are duplicated A-B and B-A, you will need to use the unique timegroup*dyadID or divide counts by 2. 


### 4. Dyad stats

```{r, eval = TRUE}
# Get the unique dyads by timegroup
# NOTE: we are explicitly selecting only where dyadID is not NA
dyads <- unique(edges[!is.na(dyadID)], by = c('timegroup', 'dyadID'))

# NOTE: if we wanted to also include where dyadID is NA, we should do it explicitly
# dyadNN <- unique(DT[!is.na(NN)], by = c('timegroup', 'dyadID'))

# Get where NN was NA
# dyadNA <- DT[is.na(NN)]

# Combine where NN is NA
# dyads <- rbindlist(list(dyadNN, dyadNA))


# Set the order of the rows
setorder(dyads, timegroup)

## Count number of timegroups dyads are observed together
dyads[, nObs := .N, by = .(dyadID)]

## Count consecutive relocations together
# Shift the timegroup within dyadIDs
dyads[, shifttimegrp := shift(timegroup, 1), by =  dyadID]

# Difference between consecutive timegroups for each dyadID
# where difftimegrp == 1, the dyads remained together in consecutive timegroups
dyads[, difftimegrp := timegroup - shifttimegrp]


# Run id of diff timegroups
dyads[, runid := rleid(difftimegrp), by = dyadID]

# N consecutive observations of dyadIDs
dyads[, runCount := fifelse(difftimegrp == 1, .N, NA_integer_), by = .(runid, dyadID)]

## Start and end of consecutive relocations for each dyad
# Dont consider where runs aren't more than one relocation
dyads[runCount > 1, start := fifelse(timegroup == min(timegroup), TRUE, FALSE), by = .(runid, dyadID)]

dyads[runCount > 1, end := fifelse(timegroup == max(timegroup), TRUE, FALSE), by = .(runid, dyadID)]

## Example output
dyads[dyadID == 'B-H', 
      .(timegroup, nObs, shifttimegrp, difftimegrp, runid, runCount, start, end)]
```

<!-- mean xy, todo -->

