---
title: "somspace: Spatial classification with Self-Organizing Maps"
author: "Yannis Markonis"
date: "`r Sys.Date()`"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{somspace: Spatial classification with Self-Organizing Maps}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, eval = F)
```

This is is an introductory presentation of the `somspace` package, as implemented in the study of [Markonis and Strnad [2020]](https://doi.org/10.1177/0959683620913924), to detect spatial drought patterns in Europe.

## Data import 

In this study the Old World Drought Atlas [[Cook et al., 2015](https://doi.org/10.1126/sciadv.1500561)] was used, as pre-processed by [Markonis et al. [2018]](https://doi.org/10.1038/s41467-018-04207-7). The file can be downloaded [here](https://www.fzp.czu.cz/en/r-9409-science-research/r-9674-leading-research-groups/r-9669-hydrological-and-climate-variability/r-9713-team-news/webapp-for-hydroclimate-reconstruction.html), and is a `data.table` object saved as an `.rds` file. Alternatively, the `owda` data object that is included in the package can be used, which covers only period 1500-2012.

```{r}
library(somspace)
str(owda)
```

## Data preparation

First of all, we have to transform the raw data from `data.table` to `somin` class. If the original data have different structure/type them it is important to transform them to a 4-column `data.table`, containing time, lat, lon and the variable.

```{r}
inp <- sominp(owda) #depending on data set size may take some time
```

The `somin` class is a list with three elements. The first one, `input_for_som`, is a `matrix` that will be used by `kohonen::som` function to generate the SOM. The second one, `coords`, is a `data.table` with the point ids and their coordinates. The last one, `input_dt` is also a `data.table` with the raw data that were used to generate `input_for_som`.

```{r}
str(inp)
```

## SOM generation

The generation of SOM is straightforward:

```{r}
my_som <- somspa(inp)
```

Which creates a `somsp` object that contains the som, the nodes properties and the original data.

```{r}
str(somsp)
```

Since `somspa()` uses the function `som()` from `kohonen` package it can also take `som()` arguments. For instance, to create a 6 x 6 map after 1000 iterations:

```{r}
my_som <- somspa(inp, grid = somgrid(6, 6), rlen = 1000)
```

## SOM analysis

The `somsp` objects can be easily plotte by `plot()` or summarized by `summary()` functions.

```{r}
plot(my_som)
summary(my_som)
```

Additionaly, the average time series of specific SOMs can be plotted by `plot_ts`:

```{r}
plot_ts(my_som, 12)
plot_ts(my_som, c(1, 12, 21, 39)) 
plot_ts(my_som, 1:max(my_som$summary$node)) #plots all soms
```

## Further classification of SOMs in regions

To further reduce the number of regions with similar characteristics `somregs()` is applied. It uses the `hclust()` function for Hierarchical cluster analysis and its default arguments can be changed accordingly. Please not that the `n` argument refers to the maximum number of regions and not to a single one. `somregs()` will merge SOMs to similar regions starting from 2 and reachinh `n` and will keep them all in the resulting `regs` object.

```{r}
my_regions <- somregs(my_som, 12)
my_regions <- somregs(my_som, 17, method = "ward.D2") 
```

`regs` objects which are two-element lists. The `regions` element contains the information about the regions and the `input_dt` the time series of the original dataset. Similarly to `somsp` objects, `regs` can be plot as maps or time series.

```{r}
plot(my_regions, c(2, 5, 9, 13), 2, 2)
plot_ts(my_regions, 2)
```

## Complex network analysis

Finally, a specific classification of certain number of regions can be analysed through canonical networks and ploted as map with `cnet()`. The threshold, `thres`, refers to the cross-correlation coefficient between the averaged time series of each region; only values above `thres` are considered to be network connections.

```{r}
cnet(my_regions, 12, 0.3)
```