---
title: "DrugBank Database XML Parser"
author: "Mohammed Ali"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{DrugBank Database XML Parser}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "docs/articles/",
  out.width = "100%"
)
```

## Introduction
The main purpose of the `dbparser` package is to parse the 
[DrugBank](https://go.drugbank.com/) database which is downloadable in XML format 
from [this link](https://go.drugbank.com/releases/latest). The parsed data can 
then be explored and analyzed as desired by the user. 
In this tutorial, we will see how to use `dbparser` along with `dplyr` and 
`ggplot2` along with other libraries to do simple drug analysis

## Loading and Parsing the Data

Before starting the code we are assuming the following:

- user already downloaded *DrugBank* xml database file based on the
[Read Me](https://docs.ropensci.org/dbparser/) instructions or the above note,
- user saved the downloaded database in working directory as `C:\`.
- user named the downloaded xml file **drugbank.xml**. 

Now we can loads the `drugs` info, `drug groups` info and `drug targets`
actions info.

```{r eval=T}
## load dbparser package
suppressPackageStartupMessages({
  library(tidyr)
  library(dplyr)
  library(canvasXpress)
  library(tibble)
  library(dbparser)
})


## load drugs data
drugs <- readRDS(system.file("drugs.RDS", package = "dbparser"))

## load drug groups data
drug_groups <- readRDS(system.file("drug_groups.RDS", package = "dbparser"))

## load drug targets actions data
drug_targets_actions <- readRDS(system.file("targets_actions.RDS", package = "dbparser"))
```


## Exploring the data

Following is an example involving a quick look at a few aspects of the parsed 
data. First we look at the proportions of `biotech` and `small-molecule` drugs 
in the data.

```{r eval=T}
## view proportions of the different drug types (biotech vs. small molecule)
type_stat <- drugs %>% 
  select(type) %>% 
  group_by(type) %>% 
  summarise(count = n()) %>% 
  column_to_rownames("type")

canvasXpress(
  data             = type_stat,
  graphOrientation = "vertical",
  graphType        = "Bar",
  showSampleNames  = FALSE,
  title            ="Drugs Type Distribution",
  xAxisTitle       = "Count"
)
```


Below, we view the different `drug_groups` in the data and how prevalent they 
are.

```{r eval=T}
## view proportions of the different drug types for each drug group
type_stat <- drugs %>% 
  full_join(drug_groups, by = c("drugbank_id")) %>% 
  select(type, group) %>% 
  group_by(type, group) %>% 
  summarise(count = n()) %>% 
  pivot_wider(names_from = group, values_from = count) %>% 
  column_to_rownames("type")

canvasXpress(
  data           = type_stat,
  graphType      = "Stacked",
  legendColumns  = 2,
  legendPosition = "bottom",
  title          ="Drug Type Distribution per Drug Group",
  xAxisTitle     = "Quantity",
  xAxis2Show     = TRUE,
  xAxisShow      = FALSE,
  smpTitle      = "Drug Group")
```

Finally, we look at the `drug_targets_actions` to observe their proportions as 
well.


```{r eval=T}
## get counts of the different target actions in the data
targetActionCounts <- 
    drug_targets_actions %>% 
    group_by(action) %>% 
    summarise(count = n()) %>% 
    arrange(desc(count)) %>% 
    top_n(10) %>% 
    column_to_rownames("action")

## get bar chart of the 10 most occurring target actions in the data
canvasXpress(
  data            = targetActionCounts,
  graphType       = "Bar",
  legendColumns   = 2,
  legendPosition  = "bottom",
  title           = "Target Actions Distribution",
  showSampleNames = FALSE,
  xAxis2Show      = TRUE,
  xAxisShow       = FALSE)
```