This example demonstrates how SEM forests can be grown. SEM forests are ensembles of typically hundreds to thousands of SEM trees. Using permutation-based variable importance estimates, we can aggregate the importance of each predictor for improving model fit.

Here, we use the affect dataset and a simple SEM with only a single observed variable and no latent variables.

Load data

Load affect dataset from the psychTools package. These are data from two studies conducted in the Personality, Motivation and Cognition Laboratory at Northwestern University to study affect dimensionality and the relationship to various personality dimensions.

library(psychTools)
data(affect)

knitr::kable(head(affect))
Study Film ext neur imp soc lie traitanx state1 EA1 TA1 PA1 NA1 EA2 TA2 PA2 NA2 state2 MEQ BDI
maps 3 18 9 7 10 3 24 22 24 14 26 2 6 5 7 4 NA NA 0.0476190
maps 3 16 12 5 8 1 41 40 9 13 10 4 4 14 5 5 NA NA 0.3333333
maps 3 6 5 3 1 2 37 44 1 14 4 2 2 15 3 1 NA NA 0.1904762
maps 3 12 15 4 6 3 54 40 5 15 1 0 4 15 0 2 NA NA 0.3846154
maps 3 14 2 5 6 3 39 67 12 20 7 13 14 15 16 13 NA NA 0.3809524
maps 1 6 15 2 4 5 51 38 9 14 5 1 7 12 2 2 NA NA 0.2380952

affect$Film <- as.factor(affect$Film)
affect$lie <- as.ordered(affect$lie)
affect$imp <- as.ordered(affect$imp)

Create simple model of state anxiety

The following code implements a simple SEM with only a single manifest variables and two parameters, the mean of state anxiety after having watched a movie (state2), \(\mu\), and the variance of state anxiety, \(\sigma^2\).

library(OpenMx)
manifests<-c("state2")
latents<-c()
model <- mxModel("Univariate Normal Model", 
type="RAM",
manifestVars = manifests,
latentVars = latents,
mxPath(from="one",to=manifests, free=c(TRUE), 
       value=c(50.0) , arrows=1, label=c("mu") ),
mxPath(from=manifests,to=manifests, free=c(TRUE), 
       value=c(100.0) , arrows=2, label=c("sigma2") ),
mxData(affect, type = "raw")
);

result <- mxRun(model)
#> Running Univariate Normal Model with 2 parameters

These are the estimates of the model when run on the entire sample:

summary(result)
#> Summary of Univariate Normal Model 
#>  
#> free parameters:
#>     name matrix    row    col  Estimate  Std.Error A
#> 1 sigma2      S state2 state2 115.05414 12.4793862  
#> 2     mu      M      1 state2  42.45118  0.8226717  
#> 
#> Model Statistics: 
#>                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
#>        Model:              2                    168              1289.158
#>    Saturated:              2                    168                    NA
#> Independence:              2                    168                    NA
#> Number of observations/statistics: 330/170
#> 
#> Information Criteria: 
#>       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
#> AIC:       953.1576               1293.158                 1293.194
#> BIC:       314.9100               1300.756                 1294.412
#> CFI: NA 
#> TLI: 1   (also known as NNFI) 
#> RMSEA:  0  [95% CI (NA, NA)]
#> Prob(RMSEA <= 0.05): NA
#> To get additional fit indices, see help(mxRefModels)
#> timestamp: 2023-11-24 11:19:14 
#> Wall clock time: 0.103179 secs 
#> optimizer:  SLSQP 
#> OpenMx version number: 2.21.1 
#> Need help?  See help(mxSummary)

Forest

Create a forest control object that stores all tuning parameters of the forest. Note that we use only 5 trees for illustration. Please increase the number in real applications to several hundreds. To speed up computation time, consider score-based test for variable selection in the trees.

control <- semforest.control(num.trees = 5)
print(control)
#> SEM-Forest control:
#> -----------------
#> Number of Trees:  5 
#> Sampling:  subsample 
#> Comparisons per Node: 2 
#> 
#>  SEM-Tree control:
#>  ▔▔▔▔▔▔▔▔▔▔ 
#> ● Splitting Method: fair
#> ● Alpha Level: 1
#> ● Bonferroni Correction:FALSE
#> ● Minimum Number of Cases: 20
#> ● Maximum Tree Depth: NA
#> ● Number of CV Folds: 5
#> ● Exclude Heywood Cases: FALSE
#> ● Test Invariance Alpha Level: NA
#> ● Use all Cases: FALSE
#> ● Verbosity: FALSE
#> ● Progress Bar: TRUE
#> ● Seed: NA

Now, run the forest using the control object:

forest <- semforest( model=model,
                     data = affect, 
                     control = control,
                     covariates = c("Study","Film", "state1",
                                    "PA2","NA2","TA2"))
#> 
Beginning initial fit attempt
Fit attempt 0, fit=1289.15758570645, new current best! (was 1387.78413290756)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=851.646466911486, new current best! (was 851.696259435742)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=726.882513570189, new current best! (was 731.814191434367)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=94.4370372720149, new current best! (was 97.7652405573101)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=628.787427382415, new current best! (was 629.117273012879)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=364.141372072193, new current best! (was 382.688453269772)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=204.996099193361, new current best! (was 222.035369732289)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=104.801741040219, new current best! (was 111.30490331982)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=89.6171110836897, new current best! (was 93.6911958735417)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=125.032880809025, new current best! (was 142.106002339903)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=219.034583702919, new current best! (was 246.098974112643)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=129.834124979535, new current best! (was 134.262804469707)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=78.8269315233284, new current best! (was 84.7717792332117)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=89.309919268972, new current best! (was 119.832275477118)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=51.5054513724598, new current best! (was 52.0536040107659)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=36.2328911435456, new current best! (was 37.2563152582061)
                                                                             

[32m✔
[39m Tree construction finished [took 24s].
#> 
Beginning initial fit attempt
Fit attempt 0, fit=832.762284896479, new current best! (was 833.668824759659)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=314.443891900173, new current best! (was 349.92636744234)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=202.211595137318, new current best! (was 208.957379466059)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=71.067223966773, new current best! (was 76.0114644472147)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=123.539654177957, new current best! (was 126.200130690103)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=94.1402003641601, new current best! (was 105.486512434114)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=463.661722698875, new current best! (was 482.835917454139)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=129.142505063043, new current best! (was 133.543565803328)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=328.864030797081, new current best! (was 330.118156895546)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=217.91572359582, new current best! (was 221.661059296144)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=90.8580349012032, new current best! (was 95.5197442746416)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=120.554376047365, new current best! (was 122.395979321179)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=93.5820272747821, new current best! (was 107.202971500938)
                                                                             

[32m✔
[39m Tree construction finished [took 19s].
#> 
Beginning initial fit attempt
Fit attempt 0, fit=766.246493590707, new current best! (was 766.96237204902)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=469.857512675785, new current best! (was 489.610265226216)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=40.8697076887107, new current best! (was 70.0280870384821)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=394.071092127565, new current best! (was 399.829425637303)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=242.03860650531, new current best! (was 244.585438842662)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=87.7022199419838, new current best! (was 95.9033513606092)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=143.764128743671, new current best! (was 146.135255144701)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=145.732689418238, new current best! (was 149.485653284903)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=250.487285319995, new current best! (was 276.636228364491)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=62.7742938541977, new current best! (was 77.0194796306896)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=171.572969072381, new current best! (was 173.467805689305)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=67.0670718090705, new current best! (was 71.2444100523089)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=95.1407873626509, new current best! (was 100.328559020072)
                                                                             

[32m✔
[39m Tree construction finished [took 23s].
#> 
Beginning initial fit attempt
Fit attempt 0, fit=792.87427133594, new current best! (was 793.473921164426)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=301.875491545489, new current best! (was 333.905436334572)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=236.395112114597, new current best! (was 236.56272872047)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=122.073656814563, new current best! (was 129.148785689076)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=100.725570886, new current best! (was 107.24632642552)
                                                                         

Beginning initial fit attempt
Fit attempt 0, fit=64.5122445791084, new current best! (was 65.3127628250186)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=438.333026258003, new current best! (was 458.968835001368)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=299.124041509687, new current best! (was 309.89047795552)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=199.385287311482, new current best! (was 202.935504821456)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=91.3985222194889, new current best! (was 95.4398384747015)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=99.3904119831404, new current best! (was 103.94544883678)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=92.019130727023, new current best! (was 96.18853668823)
                                                                          

Beginning initial fit attempt
Fit attempt 0, fit=113.562329268129, new current best! (was 128.442548302483)
                                                                             

[32m✔
[39m Tree construction finished [took 20s].
#> 
Beginning initial fit attempt
Fit attempt 0, fit=843.795695783859, new current best! (was 844.832846959762)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=451.086757356321, new current best! (was 464.681948324559)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=68.6086510184869, new current best! (was 96.5191286782373)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=347.774037949619, new current best! (was 354.567628678084)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=215.795287600494, new current best! (was 223.460510300638)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=59.842680190745, new current best! (was 71.4699227265281)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=142.28571743513, new current best! (was 144.325364873966)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=113.739868167671, new current best! (was 124.31352764898)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=367.046880096858, new current best! (was 379.1137474593)
                                                                           

Beginning initial fit attempt
Fit attempt 0, fit=88.001609414193, new current best! (was 102.344351642543)
                                                                            

Beginning initial fit attempt
Fit attempt 0, fit=260.019142872613, new current best! (was 264.702528454314)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=107.427070917885, new current best! (was 111.677948543757)
                                                                             

Beginning initial fit attempt
Fit attempt 0, fit=146.638428842067, new current best! (was 148.341194328856)
                                                                             

[32m✔
[39m Tree construction finished [took 23s].
#> 
[32m✔
[39m Forest completed [took ~2min]

Variable importance

Next, we compute permutation-based variable importance. This may take some time.

vim <- varimp(forest)
print(vim, sort.values=TRUE)
#> Variable Importance
#>     Study       PA2      Film       TA2    state1       NA2 
#>        NA  17.15543  24.47543  30.22088  59.92358 120.82924
plot(vim)

From this, we can learn that variables such as NA2 representing negative affect (after the movie), TA2 representing tense arousal (after the movie), and state1 representing the state anxiety before having watched the movie, are the best predictors of difference in the distribution of state anxiety (in either mean, variance or both) after having watched the movie.