Wednesday, 14 April 2010

R: parallel processing using multicore package

I have been meaning to look at adding some parallel processing to R as I have some scripts that are painfully slow and embarrassingly parallel. There seem to be a lot of packages around for doing parallel computing, listed here.

I decided to look at multicore as it seemed easy to implement. The core of the package is the mclapply function, which is the multi core version of lapply. Basically you install the package,

install.packages("multicore")

load the library,

library(multicore)

then replace any instances of lapply in your code with mclapply it will speed up your code! Easy.

Obviously there are more complications than this and there are various options you can use, such as the number of cores to use etc.

To give a quick test:


test <- lapply(1:10,function(x) rnorm(10000))
system.time(x <- lapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.954   0.246   2.795
system.time(x <- mclapply(test,function(x) loess.smooth(x,x)))
#   user  system elapsed
#  0.896   0.898   0.914

So the elapsed time went down from 2.795 to 0.914, which is about three times faster. Not bad.

The package also contains parallel and collect functions which allow you to run any processes in parallel, then collect will recover the results when they are all finished.

I have only just started using it, but first impressions are good. 

No comments:

Post a Comment