maldipickr

library(maldipickr)

Quickstart

The {maldipickr} package helps microbiologists reduce duplicate/clonal bacteria from their cultures and eventually exclude previously selected bacteria. {maldipickr} achieve this feat by grouping together data from MALDI Biotyper and helps choose representative bacteria from each group using user-relevant metadata – a process known as cherry-picking.

{maldipickr} cherry-picks bacterial isolates with MALDI Biotyper:

Using taxonomic identification report

First make sure {maldipickr} is installed and loaded, alternatively follow the instructions to install the package.

Cherry-picking four isolates based on their taxonomic identification by the MALDI Biotyper is done in a few steps with {maldipickr}.

Get example data

We import an example Biotyper CSV report and glimpse at the table.

report_tbl <- read_biotyper_report(
  system.file("biotyper_unknown.csv", package = "maldipickr")
)
report_tbl %>%
  dplyr::select(name, bruker_species, bruker_log) %>% knitr::kable()
name bruker_species bruker_log
unknown_isolate_1 not reliable identification 1.33
unknown_isolate_2 not reliable identification 1.40
unknown_isolate_3 Faecalibacterium prausnitzii 1.96
unknown_isolate_4 Faecalibacterium prausnitzii 2.07

Delineate clusters and cherry-pick

Delineate clusters from the identifications after filtering the reliable ones and cherry-pick one representative spectra.

Unreliable identifications based on the log-score are replaced by “not reliable identification”, but stay tuned as they do not represent the same isolates!

report_tbl <- report_tbl %>%
  dplyr::mutate(
      bruker_species = dplyr::if_else(bruker_log >= 2, bruker_species,
                                      "not reliable identification")
  )
knitr::kable(report_tbl)
name sample_name hit_rank bruker_quality bruker_species bruker_taxid bruker_hash bruker_log
unknown_isolate_1 NA 1 - not reliable identification NA 3e920566-2734-43dd-85d0-66cf23a2d6ef 1.33
unknown_isolate_2 NA 1 - not reliable identification NA 88a85875-eeb5-4858-966e-98a077325dc3 1.40
unknown_isolate_3 NA 1 + not reliable identification 137408536 2d266f20-5428-428d-96ec-ddd40200794b 1.96
unknown_isolate_4 NA 1 +++ Faecalibacterium prausnitzii 137408536 2d266f20-5428-428d-96ec-ddd40200794b 2.07

The chosen ones are indicated by to_pick column.

report_tbl %>%
  delineate_with_identification() %>%
  pick_spectra(report_tbl, criteria_column = "bruker_log") %>%
  dplyr::relocate(name, to_pick, bruker_species) %>% 
  knitr::kable()
#> Generating clusters from single report
name to_pick bruker_species membership cluster_size sample_name hit_rank bruker_quality bruker_taxid bruker_hash bruker_log
unknown_isolate_1 TRUE not reliable identification 2 1 NA 1 - NA 3e920566-2734-43dd-85d0-66cf23a2d6ef 1.33
unknown_isolate_2 TRUE not reliable identification 3 1 NA 1 - NA 88a85875-eeb5-4858-966e-98a077325dc3 1.40
unknown_isolate_3 TRUE not reliable identification 4 1 NA 1 + 137408536 2d266f20-5428-428d-96ec-ddd40200794b 1.96
unknown_isolate_4 TRUE Faecalibacterium prausnitzii 1 1 NA 1 +++ 137408536 2d266f20-5428-428d-96ec-ddd40200794b 2.07

Using spectra data

In parallel to taxonomic identification reports, {maldipickr} process spectra data. Make sure {maldipickr} is installed and loaded, alternatively follow the instructions to install the package.

Cherry-picking six isolates from three species based on their spectra data obtained from the MALDI Biotyper is done in a few steps with {maldipickr}.

Get example data

We set up the directory location of our example spectra data, but adjust for your requirements. We import and process the spectra which gives us a named list of three objects: spectra, peaks and metadata (more details in Value section of process_spectra()).

spectra_dir <- system.file("toy-species-spectra", package = "maldipickr")

processed <- spectra_dir %>%
  import_biotyper_spectra() %>%
  process_spectra()

Delineate clusters and cherry-pick

Delineate spectra clusters using Cosine similarity and cherry-pick one representative spectra. The chosen ones are indicated by to_pick column.

processed %>%
  list() %>%
  merge_processed_spectra() %>%
  coop::tcosine() %>%
  delineate_with_similarity(threshold = 0.92) %>%
  set_reference_spectra(processed$metadata) %>%
  pick_spectra() %>%
  dplyr::relocate(name, to_pick) %>% 
  knitr::kable()
name to_pick membership cluster_size SNR peaks is_reference
species1_G2 FALSE 1 4 5.089590 21 FALSE
species2_E11 FALSE 2 2 5.543735 22 FALSE
species2_E12 TRUE 2 2 5.633540 23 TRUE
species3_F7 FALSE 1 4 4.889949 26 FALSE
species3_F8 TRUE 1 4 5.558884 25 TRUE
species3_F9 FALSE 1 4 5.398429 25 FALSE

This provides only a brief overview of the features of {maldipickr}, browse the other vignettes to learn more about additional features.

Session information

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Berlin
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] maldipickr_1.3.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.4              cli_3.6.1                knitr_1.48              
#>  [4] rlang_1.1.4              xfun_0.44                coop_0.6-3              
#>  [7] purrr_1.0.2              generics_0.1.3           jsonlite_1.8.7          
#> [10] glue_1.6.2               htmltools_0.5.6.1        sass_0.4.7              
#> [13] fansi_1.0.5              rmarkdown_2.28           tibble_3.2.1            
#> [16] evaluate_0.22            jquerylib_0.1.4          fastmap_1.1.1           
#> [19] yaml_2.3.7               lifecycle_1.0.4          compiler_4.3.1          
#> [22] dplyr_1.1.4              pkgconfig_2.0.3          tidyr_1.3.0             
#> [25] readBrukerFlexData_1.9.1 rstudioapi_0.15.0        digest_0.6.33           
#> [28] R6_2.5.1                 tidyselect_1.2.1         utf8_1.2.3              
#> [31] pillar_1.9.0             parallel_4.3.1           magrittr_2.0.3          
#> [34] bslib_0.5.1              withr_2.5.1              tools_4.3.1             
#> [37] MALDIquant_1.22.1        cachem_1.0.8