This vignette describes how the pccc package generates
the Complex Chronic Condition Categories (CCC) from ICD-9 and ICD-10
codes.
A CCC is “any medical condition that can be reasonably expected to last at least 12 months (unless death intervenes) and to involve either several different organ systems or 1 organ system severely enough to require specialty pediatric care and probably some period of hospitalization in a tertiary care center.” The categorization is based on the work of Feudtner et al. (2000 & 2014), as referenced below.
A supplemental reference document showing the lists of codes for each
category was published as a supplement to Feudtner et al. (2014) and we
have made it available as part of the pccc package. After
installing the package, you can find the file on your system with the
below system.file call. Open the file with your
preferred/available program for .docx files (Word,
etc.).
To evaluate the code chunks in this example you will need to load the following R packages.
There are 12 total categories of CCCs used in this package. The first group of 10 are mutually exclusive - only one of them can be derived from a single ICD code:
The last 2 can be be selected in addition to the above codes - for example, one ICD code could generate CCC categorization as both Gastrointestinal and Technology Dependency:
To see actual specific ICD codes by category, see pccc-icd-codes.
The ccc function is the workhorse here. Simply put, a
user will provide ICD codes as strings and ccc will return
CCC categories. CCC codes for ICD-9-CM are matched on substrings and ICD
10 codes are matched on full codes, but the ccc function
uses the same “starts with substring” matching logic for both, except in
a few cases described in the next paragraph.
Some datasets may contain different degrees of specificity of ICD-9-CM codes, which can lead to issues with substring matching for certain codes. For example, consider a patient with Congenital hereditary muscular dystrophy. The least specific ICD-9-CM code for Muscular dystrophy is 359, which is a CCC code. The exact ICD-9-CM code specifying Congenital hereditary muscular dystrophy is 3590. Even when describing the same patient, one dataset may contain the 359 code while another dataset may contain the 3590 code. If we use substring matching logic above and match on 359, we would capture the patient in both datasets. However, we would also capture non-CCC diagnoses like 3594, Toxic myopathy. If we use substring matching logic and match on 3590, we would only capture the patient in the dataset with more specific ICD-9-CM codes. We address this problem by exact matching for less specific codes (e.g., the code 359 will match only if the dataset contains the 3-digit code 359) and substring matching for more specific codes (e.g., code 3590 will match any code beginning with 3590). This approach improves the sensitivity of detecting CCCs in datasets with less specific codes (e.g. 359) and also reduces misclassification errors in datasets with more specific codes (e.g. 3590).
We have listed these exact match exceptions under their corresponding CCC category in the pccc-icd-codes description.
Users of the pccc package will need to
pre-process the ICD-9 and ICD-10 codes in their data so that the strings
are formatted in the way that the pccc package will
recognize them.
Specific rules to format ICD Codes correctly:
Potential issues with improperly formatted ICD codes:
Users of PCCC may find the R Package ICD useful.
To illustrate the how the input formatting impacts the identification
of a CCC, consider the data data.frame named
dat below. These data have information about three patients
(A-C). Each subject has the same ICD-9-CM diagnosis code
(e.g. Hypertrophic obstructive cardiomyopathy, ICD-9-CM 425.11,
which should be sent as 4251) and the same ICD-9-CM procedure code
(e.g. Heart transplantation, ICD-9-CM 37.51, which should be
sent as 3751), but each input is formatted differently. Based on the
ICD-9-CM diagnosis code, the ccc function will only
identify subject A as having a CCC. Based on the ICD-9-CM
procedure code, the ccc function will only identify subject
B as having a CCC and will also flag the Transplantation
category.
dat <- data.frame(ids = c("A", "B", "C"), 
                  dxs = c("4251", "425.1", "425.1"), 
                  procs = c("37.51", "3751", "37.51"))
dat
#>   ids   dxs procs
#> 1   A  4251 37.51
#> 2   B 425.1  3751
#> 3   C 425.1 37.51
ccc(dat, 
    id = ids, 
    dx_cols = dxs, 
    pc_cols = procs, 
    icdv = 9)
#>   ids neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1   A         0   1           0     0  0           0         0               0
#> 2   B         0   1           0     0  0           0         0               0
#> 3   C         0   0           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        1
#> 2          0        0        0          1        1
#> 3          0        0        0          0        0This example used a tool developed by Seth Russell (available at icd_file_generator) to create a sample data file for ICD-9-CM and ICD-10-CM. The generated data files contain randomly generated ICD codes for 1,000 patients and is comprised of 10 columns of diagnosis codes (d_cols), 10 columns of procedure codes (p_cols), and 10 columns of other data (g_cols).
Sample of how ICD-9-CM test file was generated:
pccc_icd9_dataset <- generate_sample(
  v = 9,
  n_rows = 10000,
  d_cols = 10,
  p_cols = 10,
  g_cols = 10
)
save(pccc_icd9_dataset, file="pccc_icd9_dataset.rda")Example using sample patient data set:
library(dplyr)
library(pccc)
ccc_result <-
    ccc(pccc::pccc_icd9_dataset[, c(1:21)], # get id, dx, and pc columns
        id      = id,
        dx_cols = dplyr::starts_with("dx"),
        pc_cols = dplyr::starts_with("pc"),
        icdv    = 09)
# review results
head(ccc_result)
#>   id neuromusc cvd respiratory renal gi hemato_immu metabolic congeni_genetic
#> 1  1         0   0           0     0  1           0         0               0
#> 2  2         0   0           0     0  0           0         0               0
#> 3  3         1   0           0     0  1           0         0               0
#> 4  4         0   0           0     0  0           0         0               0
#> 5  5         0   0           0     0  0           0         0               0
#> 6  6         0   1           0     0  0           0         0               0
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1          0        0        0          0        1
#> 2          0        0        0          0        0
#> 3          0        0        1          0        1
#> 4          0        0        0          0        0
#> 5          0        0        0          0        0
#> 6          0        1        1          0        1
# view number of patients with each CCC
sum_results <- dplyr::summarize_at(ccc_result, vars(-id), sum) %>% print.data.frame
#>   neuromusc cvd respiratory renal  gi hemato_immu metabolic congeni_genetic
#> 1       102 151          64   119 106          61        80              25
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1        400       20      287         61      741
# view percent of total population with each CCC
dplyr::summarize_at(ccc_result, vars(-id), mean) %>% print.data.frame
#>   neuromusc   cvd respiratory renal    gi hemato_immu metabolic congeni_genetic
#> 1     0.102 0.151       0.064 0.119 0.106       0.061      0.08           0.025
#>   malignancy neonatal tech_dep transplant ccc_flag
#> 1        0.4     0.02    0.287      0.061    0.741