---
title: "Getting Started"
author: Justin Singh-Mohudpur
output:
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{Getting Started}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(hilbert)
```

The core utilities provided by `hilbert` involve converting coordinates to grid positions,
then grid positions to indices on a Hilbert Curve. This conversion is done *without* respect
to coordinate reference systems (CRS), **but**, the CRS of the original coordinates coincides
with the decoded coordinates.

## Example Dataset

Let's traverse through a simple example of using `hilbert`. We'll use a dataset of 1000 seismic events (with some filtered off)
that occurred near Fiji since 1964 (see: [`quakes`](https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/quakes.html))

```{r}
data("quakes")
quakes <- quakes[quakes$long < 179.5, ]
```

```{r, echo = FALSE}
knitr::kable(
  head(quakes),
  format = "html",
  align = "c",
  table.attr = "style='width:100%;'",
  caption = "*Sample of Seismic Events (w/ MB > 4.0) Near Fiji Since 1964*"
)
```

## Encoding Process

### Extent

When working with coordinate data, we first need to consider the bounds of our grid.
Some methods utilize the maximum possible grid (i.e. [Geohash](https://en.wikipedia.org/wiki/Geohash)).

In this case, we will look at a localized approach, using the **extent** of our dataset as the grid bounds.

:::{style="display:flex;align-items:center;"}
::::::{style="margin-right:1rem;"}
```{r, echo = FALSE, out.width="100%", out.extra='style="border:none;pointer-events:none;user-select:none;"'}
knitr::include_graphics("1.png")
```
::::::
::::::{}
When exploring this dataset, we see that it is particularly suitable for indexing
on a space-filling curve, due to the *closeness* of points. To index these points,
we'll perform the following operations:

1. Get the **extent** of the dataset.
2. Find the **position** of each point.
3. Convert the position to the curve **index**.

Getting the **extent** of this dataset is easy, it is simply the
min/max of both the X and Y coordinates.

```{r}
extent <- c(xmin = min(quakes$long), ymin = min(quakes$lat),
            xmax = max(quakes$long), ymax = max(quakes$lat))
```

```{r, echo = FALSE}
knitr::kable(
  t(extent),
  format = "html",
  col.names = c("X-Min", "Y-Min", "X-Max", "Y-Max"),
  align = "c",
  table.attr = "style='width:100%;'",
  caption = "*Quakes Extent*"
)
```
::::::
:::

> **Note:** *closeness* is often subjective. Generally, the real factor for determining how
> suitable coordinates are for indexing to a space-filling curve is the precision requirements.
> In this case, the precision of seismic events need not be highly-precise. However, in the case
> of coordinates for buildings, then high-precision is usually a requirement.

### Position

Now that we have the extent of these coordinates, let's get the **position**.

```{r}
positions <- hilbert::coords_to_position(
    x      = quakes,           # Using the `data.frame` method
    coords = c("long", "lat"), # Either the column names or indices for the coordinates
    n      = 4L,               # Dimensions exponent, i.e. 2^n x 2^n grid
    extent = extent,           # The previously found extent
    attach = FALSE             # When `TRUE`, attaches to the original `data.frame`
)
```

```{r, echo = FALSE}
tmp <- t(head(positions, 21))
tmp[, 21] <- c("...", "...")
rownames(tmp) <- c("**X-Position**", "**Y-Position**")
knitr::kable(tmp, col.names = NULL, row.names = TRUE)
```

Notice here, we choose $n = 4$. In the context of our grid, $n$ is the **dimensions exponent**.
It determines the dimensions of our underlying grid, such that when $n = 4$, we have a
$$2^n \times 2^n = 2^4 \times 2^4 = 16 \times 16 \text{ grid.}$$
A $2^n \times 2^n$ grid is required due to the construction of a Hilbert Curve,
see: [Space-filling Curves](https://en.wikipedia.org/wiki/Space-filling_curve).

:::{style="display:flex;align-items:center;"}
::::::{}
Now that we have our positions, we can visually see the cells that each point falls into.
Since the grid in this case is *coarse*, many points will be indexed to the same cell,
for example, the upper left-most cell.

This can be either unwanted or beneficial. If you're using a space-filling curve for
a **spatial index**, this could be beneficial, since points that are closer together
will be indexed together, and hence, spatial operations can be improved to only work on
points that fall under the same index. For more information,
see: [Spatial Index](https://en.wikipedia.org/wiki/Spatial_database#Spatial_index).

If we want to make our grid *finer*, we can increase the $n$ value used. For example,
setting $n = 6$ will give us a $64 \times 64$ grid, which will segment the points further.
::::::
::::::{style="margin-left:1rem;"}
```{r, echo = FALSE, out.width="100%", out.extra='style="border:none;pointer-events:none;user-select:none;"'}
knitr::include_graphics("2.png")
```
::::::
:::

### Index

At this point, we can now index our positions. To do this, we simply take our `positions`
data frame, and call `hilbert::index`.

The same $n$ **must** be used in both `hilbert::coords_to_position`
and `hilbert::index`, otherwise, the indexing calculations will be incorrect.

```{r}
indices <- hilbert::index(positions, coords = c(1, 2), n = 4L)
```

```{r, echo = FALSE}
tmp <- t(head(indices, 21))
tmp[, 21] <- c("...", "...", "...")
rownames(tmp) <- c("**X-Position**", "**Y-Position**", "**Index**")
knitr::kable(tmp, col.names = NULL, row.names = TRUE)
```

:::{style="display:flex;align-items:center;"}
::::::{style="width:66%;margin-right:1rem;"}
```{r, echo = FALSE, out.width="100%", out.extra='style="border:none;pointer-events:none;user-select:none;"'}
knitr::include_graphics("3.png")
```
::::::
::::::{}
This index corresponds to the cell along the (purple) curve starting from the lower left cell, to the lower right cell.

We see here that space-filling curves provide a ${X, Y} \rightarrow I$ mapping.

Another attribute of these indices, is that they are always non-negative, whole numbers.
::::::
:::

## Summary

In general, users of `hilbert` will be concerned with indexing coordinates,
but we will lay out the general processes here:


#### Encoding
```r
COORDINATES
⟶ hilbert::coords_to_position()
  ⟶ POSITIONS
    ⟶ hilbert::index()
      ⟶ INDICES
```

#### Decoding
```r
INDICES
⟶ hilbert::position()
  ⟶ POSITIONS
    ⟶ hilbert::position_to_coords()
      ⟶ COORDINATES
```