MPI Tutorial for R (Rmpi)
In the previous two posts, I introduced what MPI is and how to install MPI for R programing language. Rmpi provides an interface necessary to use MPI for parallel computing using R. Rmpi is maintained by Hao Yu at University of Western Ontario and it has been around for about a decade now. Although it doesn’t have all commands found in original MPI for C/Fortran, quite a few functions have been added and it has most of basic functions for normal operations. The manual for Rmpi is provided here.
In this post, I am going to cover a few basic commands/functions for MPI using R.
Spawning Slave CPUs
In MPI term, master is the main CPU that sends messages to dependent CPUs called slaves to complete some tasks . When you spawn slaves using mpi.spawn.Rslaves(), first it gets the number of available CPUs by default setting (depending on your system). You can use nslave option to define the specific number of CPUs you want to use for MPI. You can use higher number than actual CPUs available in your system, but you will not get any benefit from doing it. It behaves as if it has the number of CPUs, but actual computation is done by available CPUs.
Lets Execute A Command Using Slaves
There are several commands to execute codes in slaves. mpi.remote.exec() and mpi.bcast.cmd() are examples. The syntax for mpi.remote.exec() is
>mpi.remote.exec(cmd, …, simplify = TRUE, comm =1, ret =TRUE)
where cmd is a command to be executed on slaves, … is used as argument which will be used for the cmd, simplify is logical argument whether the results to be a dataframe if possible, comm is a communication number (usually 1), and ret is the logical value whether if you want results from executed code from slaves. If you use mpi.bcast.cmd() command to execute the following code, the slaves will execute the command but there will be no return values from them.
Let’s ask each slave to give back the slave number.
>mpi.remote.exec(paste("I am",mpi .comm.rank(),"of",mpi.comm.size())) $slave1  "I am 1 of 11" $slave2  "I am 2 of 11" ........ $slave10  "I am 10 of 11"
As you can see mpi.comm.rank() and mpi.comm.size() give the slave CPU number and total size of spawned slaves. The diagram below shows how this command is executed.
> mpi.remote.exec(sum(1:mpi.comm.rank())) X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 1 1 3 6 10 15 21 28 36 45 55
Measure Time to Compute
To see if your codes need to be paralleled, one can measure the time to compute the task. In R, proc.time() command returns three values and you can use this function to determine the time to compute.
1) user time: the CPU time charged for the execution of the user instructions of the calling process
2) system time: the CPU time charged for execution by the system on behalf of the calling process
3) elapsed time: the time since you logged in current account
Scalability Is Important
Increasing the number of CPUs doesn’t necessarily increase the performance. The overhead, an extra time needed to access the CPUs will increase with more CPUs for parallel computing. Here is an example of the performance of a simple code which computes the mean of 1million random numbers for 400 times. The performance increases dramatically from 2 to 4 CPUs (I), then the performance increases more slowly from 4 to 15 CPUs (II). Using More than 16 CPUs takes more time to compute than 15 CPUs (III) and had no benefit of doing so. Note: this code was run under sub-optimal interconnect network to show the effect of overhead. Results may vary dependent on your system. Under optimal condition, the time to compute should be halved if you double the number of CPUs.
library('Rmpi') mpi.spawn.Rslaves(nslaves=4) ptm<-proc.time() mpi.iparReplicate(400, mean(rnorm(1000000))) print(proc.time() - ptm)
If you use a large number of CPUs for computation, the overhead may significantly affect the overall performance. So it is important to test your scripts on different numbers of CPUs for the optimal performance. The figure below is the actual performance by reserving 24 CPUs from a large computer cluster. All 24CPUs have high speed interconnect network, therefore performance doubles when number of CPUs doubled (e.g. 3->6 or 4->8). However using more than 10CPUs has no benefit of doing so.
The commands I covered in this posts are all corrective call, which means that all slaves in a communicator are called for execution. I would like to cover more MPI commands to control individual slave in another post.
There are three more commands before finishing today’s post. These are mpi.finalize(), mpi.exit() and mpi.quit(). mpi.finalize() should be called to clean all MPI states at the end of the script. mpi.exit() will not only call mpi.finilize() but also detach the Rmpi library. If mpi.exit() is called, you need to relaunch R to load Rmpi library. mpi.quit() will quit MPI and leave R altogether.