---
title: "Combination features"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Combination features}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

*Combination features* are those which have `transformation_type` equal to `combine_linear`, `combine_min`, and `combine_max`.
These features use the values of two or more subfeatures to create a new feature.
Because the form of the JSON required for combination features differs from those of all other features, they are given special attention in this vignette.

```{r setup}
library(eider)
```

## JSON structure

Of all the top-level JSON keys specified in the [feature overview](features.html), only `output_feature_name` and `transformation_type` are still required for combination features.
As before, `output_feature_name` is the name of the feature that will be created.
The value of `transformation_type` can be:

* `combine_linear`: calculate a linear combination of the features
* `combine_min`: calculate the minimum of the features
* `combine_max`: calculate the maximum of the features

On top of these, combination features require a `subfeature` key, which is itself a JSON object.
Its keys can be any string (though it helps to be descriptive), and its values are the JSON objects which define those subfeatures, _sans_ `output_feature_name` (because those are not required).
Note that each subfeature may have a different `source_table` key, which allows the subfeatures to come from different input tables.

For linear combinations, each subfeature must further contain a `weight` key, which is a number that determines the coefficients of each feature in the linear combination.

## Minimum and maximum combination

As before, let's make up some data to illustrate this.

```{r}
input_table <- data.frame(
  id = c(1, 1, 1, 1, 2, 2, 2, 2),
  diagnosis = c("A", "A", "A", "A", "A", "A", "B", "B")
)

input_table
```

Suppose we want to find the number of times a patient has been diagnosed with "A" or the number of times they have been diagnosed with "B", whichever is *greater*.

The number of "A" diagnoses would ordinarily be specified using the following JSON:

```json
{
    "output_feature_name": "num_A",
    "source_table": "input_table",
    "transformation_type": "count",
    "absent_default_value": 0,
    "grouping_column": "id",
    "filter": {
        "column": "diagnosis",
        "type": "in",
        "value": ["A"]
    }
}
```

and the number of "B" diagnoses would be exactly identical to this, except with "A" replaced with "B".

The combination feature we seek can thus be specified as in [`json_examples/combination1.json`](https://github.com/alan-turing-institute/eider/blob/main/vignettes/json_examples/combination1.json).
Each subfeature is exactly the same as above, except that the `output_feature_name` key is omitted:

```{r comment='', echo=FALSE}
writeLines(readLines("json_examples/combination1.json"))
```

Running this gives us the expected values of 4 and 2 for the two patients respectively:

```{r}
results <- run_pipeline(
  data_sources = list(input_table = input_table),
  feature_filenames = "json_examples/combination1.json"
)

results$features
```

## Linear combination

Linear combinations allow you to calculate, for example, a weighted sum of two features.
Suppose we want to assign a score of 10 for every "A" diagnosis and 20 for every "B" diagnosis.
We can use the same JSON as above, but with two minor modifications:

* the `transformation_type` key set to `combine_linear`
* the `weight` key is added to each subfeature, with an appropriate value

The result is [`json_examples/combination2.json`](https://github.com/alan-turing-institute/eider/blob/main/vignettes/json_examples/combination2.json):

```{r comment='', echo=FALSE}
writeLines(readLines("json_examples/combination2.json"))
```

and running it:

```{r}
results <- run_pipeline(
  data_sources = list(input_table = input_table),
  feature_filenames = "json_examples/combination2.json"
)

results$features
```

Note that for a simple unweighted sum of features, all weights can be set to 1; and to take the difference between two features, one weight can be set to 1 and the other to -1.


## See also

[Feature 3 in the LTC example vignette](examples_ltc.html#feature-3-number-of-conditions) is an example of a combination feature, in this case, a sum of three subfeatures.
