---
title: "A beginner's guide to creating a bulkAnalyseR app from a GEO dataset"
output: 
  rmarkdown::html_vignette:
    vignette: >
      %\VignetteIndexEntry{A beginner's guide to creating a bulkAnalyseR app from a GEO dataset}
      %\VignetteEngine{knitr::rmarkdown}
      \usepackage[utf8]{inputenc}
---
<div style="text-align: justify"> 

```{r options, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "##>"
)
Sys.setenv("VROOM_CONNECTION_SIZE" = 1e6)
```

In this short tutorial we showcase a simple pipeline to create a bulkAnalyseR app using a publicly available dataset from the [Gene Expression Omnibus (GEO)](https://www.ncbi.nlm.nih.gov/geo/). No pre-requisites are required, as the installation of bulkAnalyseR and download of the data are included.

The example app described in this vignette can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/).

## Installation

First, install the latest version of bulkAnalyseR, starting with the CRAN and Bioconductor dependencies:

```{r cran_install, eval = FALSE}
packages.cran <- c(
  "ggplot2", "shiny", "shinythemes", "gprofiler2", "stats", "ggrepel",
  "utils", "RColorBrewer", "circlize", "shinyWidgets", "shinyjqui",
  "dplyr", "magrittr", "ggforce", "rlang", "glue", "matrixStats",
  "noisyr", "tibble", "ggnewscale", "ggrastr", "visNetwork", "shinyLP",
  "grid", "DT", "scales", "shinyjs", "tidyr", "UpSetR", "ggVennDiagram"
)
new.packages.cran <- packages.cran[!(packages.cran %in% installed.packages()[, "Package"])]
if(length(new.packages.cran))
  install.packages(new.packages.cran)

packages.bioc <- c(
  "edgeR", "DESeq2", "preprocessCore", "GENIE3", "ComplexHeatmap"
)
new.packages.bioc <- packages.bioc[!(packages.bioc %in% installed.packages()[,"Package"])]
if(length(new.packages.bioc)){
  if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
  BiocManager::install(new.packages.bioc)
}

install.packages("bulkAnalyseR")
```

## Download data and create app

### Get the expression matrix

We start by downloading and reading in the expression matrix. Rows represent genes/features and columns represent samples (note you need an internet connection to run the code below). The matrix is from [a 2022 study](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE178620) on the Stem Cell transcriptional response to Microglia-Conditioned Media. We only use a few samples in the study for illustrative purposes.

```{r read}
download_path <- paste0(tempdir(), "expression_matrix.csv.gz")
download.file(
  "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE178620&format=file&file=GSE178620%5Fraw%5Fabundances%2Ecsv%2Egz", 
  download_path
)
exp <- as.matrix(read.csv(download_path, row.names = 1))[, c(1,2,19,20)]
head(exp)
```

```{r clean up, include = FALSE}
file.remove(download_path)
```

### Defining metadata

We use a very simple metadata table with just the main condition in the experiment. Detailed metadata is available for all GEO datasets and can be downloaded and used instead.

```{r meta}
meta <- data.frame(
  name = colnames(exp),
  condition = sapply(colnames(exp), USE.NAMES = FALSE, function(nm){
    strsplit(nm, "_")[[1]][1]
  })
)
meta
```

### Pre-processing

We can now denoise and normalise the data using bulkAnalyseR

```{r preprocess,fig.width=7, fig.height=5}
exp.proc <- bulkAnalyseR::preprocessExpressionMatrix(exp, output.plot = TRUE)
```

### Creating the shiny app

Finally, we can create a shiny app. This example app can be found [here](https://bioinf.stemcells.cam.ac.uk/shiny/bulkAnalyseR/GEO/).

```{r generate app, eval=FALSE}
bulkAnalyseR::generateShinyApp(
  shiny.dir = "shiny_GEO",
  app.title = "Shiny app for visualisation of GEO data",
  modality = "RNA",
  expression.matrix = exp.proc,
  metadata = meta,
  organism = "hsapiens",
  org.db = "org.Hs.eg.db"
)
```

```{r sessionInfo}
sessionInfo()
```


