R crash accessing object created in parallel, foreach()

0

I am moving to a new Azure VM and all of a sudden getting crashes and errors in crazy places I never have before. (The new VM is a switch from Windows Server 2016 to 2019 but that may be a complete red herring.) I've tracked down one spot where I can reproduce the problem with the following code

# load packages
library(foreach)
library(randomForest)
library(iterators)
library(parallel)
library(doParallel)

numCores <- detectCores() - 1
ntrees <- 8000
treeSubs <- ntrees/numCores
# initialize
cl <- makeCluster(numCores)
registerDoParallel(cl)
# dummy datasets
x <- as.data.frame(matrix(runif(100000), 20000))
y <- gl(2, 10000)

parRf <- foreach(ntree = rep(treeSubs,numCores), .combine = randomForest::combine,
                        .packages = 'randomForest', .multicombine = TRUE) %dopar%
                                randomForest(x=x, y=y,
                        importance=TRUE,mtry=2,ntree = ntree,
                        replace = TRUE
  )

z <- matrix(runif(1000), 200)

pred <- predict(parRf, z, type = "prob")

Notice it is the predict step that causes the failure, but when I make the randomForest call not in parallel, the predict step works fine. Or if I make the data sets smaller, it also works. In RStudio I get the grey "bomb" and in RGui it just disappears.

Here are some details of the crash report from the Windows Event Log:

Faulting application name: rsession.exe, version: 1.1.463.0, time stamp: 0x5bd11fb5
Faulting module name: randomForest.dll, version: 0.0.0.0, time stamp: 0x609f54bd
Exception code: 0xc0000005
Fault offset: 0x0000000000001b42
Faulting process id: 0x1e48
Faulting application start time: 0x01d752f21b6d7a79
Faulting application path: C:\Program Files\RStudio\bin\x64\rsession.exe

I wonder if possibly this is related to this question: R Crashes when training using caret and method = gamLoess But I don't see any solution...

Here's session info:

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.16   iterators_1.0.13    randomForest_4.6-14 foreach_1.5.1      

loaded via a namespace (and not attached):
[1] compiler_4.0.5   tools_4.0.5      codetools_0.2-18
> 

Thanks in advance for any tips.

r
foreach
crash
doparallel
asked on Stack Overflow May 27, 2021 by Tim

1 Answer

0

the code works in parallel. Try running the code within a project space...(create a new project and run it within that) and check. (I have received the error on other memory centric codes when run outside the project space.)

head(pred) 1 2 1 0.553750 0.446250 2 0.533750 0.466250 3 0.367750 0.632250 4 0.578625 0.421375 5 0.487125 0.512875 6 0.423375 0.576625

answered on Stack Overflow May 27, 2021 by Veneet Bhardwaj

User contributions licensed under CC BY-SA 3.0