Archive | Parallel Computing RSS for this section

Rmpi Tutorial 4: Getting Data Back From Slaves

This is going to be the last tutorial for Rmpi. In this post I am going to cover how to receive data from slaves in Rmpi. Let’s think of a situation in the picture below. You want to gather data from slaves and combine them with the data in the master.


mpi.gather.Robj() will retrieve data from each slave and put them together like the diagram above.  Let’s try some examples

Example 1: Getting slave number information from each slave

mpi.bcast.cmd(x<-paste("I am slave no.",id))
x<-"I am a master"

Here is the output (showing only the last part)

> mpi.gather.Robj(x)
[1] "I am a master"    "I am slave no. 1" 
"I am slave no. 2" "I am slave no. 3"
> mpi.remote.exec(x)
[1] "I am slave no. 1"
[1] "I am slave no. 2"
[1] "I am slave no. 3"

If you want to retrieve data from each slave and give the whole data to all slaves, you use mpi.allgather.Robj().

Example 2: Send a string “fruit” to master” and “apple”, “banana” and “orange” to slave 1 to 3. Then retrieve data from each slave and send the all data to master and all slaves.


Here is the output

> mpi.remote.exec(x) 
[1] "apple" 
[1] "banana" 
[1] "orange" 
[1] "fruits" "apple" "banana" "orange" 
[1] "fruits" "apple" "banana" "orange" 
[1] "fruits" "apple" "banana" "orange" 
[1] "fruits" "apple" "banana" "orange"

mpi.reduce and mpi.allreduce Will Reduce Data By Simple Operation


mpi.reduce command examines a variable in the slaves & the master, do simple operation such as finding minimum or maximum value then return the value. The variable needs to exist in every slave including master, the returned value is a single value. In order for it to work, you need to call this command from all slaves and master, otherwise it will go to infinite loop.

Example 3: Set a value of x to 1 in the master, 2,3, & 4 in slave 1, 2, & 3.  Then using mpi.reduce to return a sum of all x.

# Define function for reduction
# Send a function to all slaves
# Set object x and send to slaves
# call the function in slaves
# call the same function in master

Here is the output

> mpi.remote.exec(red("sum"))
  X1 X2 X3
1  2  3  4
> mpi.reduce(x,2,"sum")
[1] 10

If you use mpi.allreduce, it will send the final value to all slaves. There are two more options in mpi.reduce and they are maxloc and minloc. If you use these options, the command will return two values, the value resulting from the operation (either minimum or maximum) and the location of the value. This can be useful to find the slave which provides the value.

> mpi.reduce(x,2,"maxloc")
[1] 4 3
> mpi.reduce(x,2,"minloc")
[1] 1 0

Note: the rank for the master is 0

Rmpi Tutorial 2: Sending Data

In MPI, there are a number of commands for sending data. It is important to meet the requirement when you send data to slaves, otherwise R will crash. I am going to cover a few important ones in this post.

Sending Identical Data to Slave CPUs

For large scale data analysis, you may want to divide the work to slave CPUs. These CPUs may need to receive some constant values that are necessary for downstream computation. To send an object to each slave, you can use mpi.bcast.Robj() command.

Example: Send a character string to 3 slaves

1. Spawn 3 slaves and create an variable x in each slave.

>x<-c("This is a test.")

2. Send x to each slave


3. Print x in each slave

[1] "This is a test."
[1] "This is a test."
[1] "This is a test."

4. Close mpi


Sending Non-identical Data to Slave CPUs

When you have a large set of data, first you divide the data and put them in a list, then send the list to each slave. There are two commands mpi.scatter.Robj() and mpi.scatter.Robj2slave() to send list to slave CPUs.

When you use these commands, you need to have exactly the same number of object potions as the number of slave CPUs. If not, you will get an error message. For example, if you spawn 3 slave CPUs, and your object to send is a list of 4, mpi.scatter.Robj2slave() will not work because you don’t have the equal numbers. However, mpi.scatter.Rbj() will work because master plus slaves equal to the number of list in the object.

Example 1: Split and send an object of list of 4 to master and slaves


This code is very similar to the one above except that x is a list of character.

[1] "This"
[1] "is"
[1] "an"
[1] "example"

Note that master receive object x but it does not overwrite existing x.

Example 2: Divide 8×4 matrix into 4 blocks and send to slaves

1. Spawn 4 slave CPUs, and create a 8×4 matrix with random numbers

           [,1]       [,2]       [,3]       [,4]
[1,] -0.3718508  0.8075626 -0.1145767  1.2152244
[2,]  1.2414776 -1.7983161  0.8113792 -0.0577753
[3,]  0.2291987 -1.8194346  0.5902288 -0.5519079
[4,] -0.6056088  0.7028118 -1.0299552  1.4069104
[5,]  0.5006542  1.2469203 -1.5266182 -0.2712369
[6,]  0.9899981  1.0211666 -1.0916166  0.9721620
[7,] -1.6689545  0.2618148 -1.0774920 -0.4962599
[8,] -0.3919911  0.1678641  0.5198690  0.7932334

2. Split the matrix into 4 of 2×4 matrices

           [,1]       [,2]       [,3]       [,4]
[1,] -0.3718508  0.8075626 -0.1145767  1.2152244
[2,]  1.2414776 -1.7983161  0.8113792 -0.0577753

           [,1]       [,2]       [,3]       [,4]
[1,]  0.2291987 -1.8194346  0.5902288 -0.5519079
[2,] -0.6056088  0.7028118 -1.0299552  1.4069104

          [,1]     [,2]      [,3]       [,4]
[1,] 0.5006542 1.246920 -1.526618 -0.2712369
[2,] 0.9899981 1.021167 -1.091617  0.9721620

           [,1]      [,2]      [,3]       [,4]
[1,] -1.6689545 0.2618148 -1.077492 -0.4962599
[2,] -0.3919911 0.1678641  0.519869  0.7932334

3. Send each matrix to slave CPUs

> mpi.scatter.Robj2slave(smat)
> mpi.remote.exec(smat)
[,1] [,2] [,3] [,4]
[1,] -0.3718508 0.8075626 -0.1145767 1.2152244
[2,] 1.2414776 -1.7983161 0.8113792 -0.0577753

[,1] [,2] [,3] [,4]
[1,] 0.2291987 -1.8194346 0.5902288 -0.5519079
[2,] -0.6056088 0.7028118 -1.0299552 1.4069104

[,1] [,2] [,3] [,4]
[1,] 0.5006542 1.246920 -1.526618 -0.2712369
[2,] 0.9899981 1.021167 -1.091617 0.9721620

[,1] [,2] [,3] [,4]
[1,] -1.6689545 0.2618148 -1.077492 -0.4962599
[2,] -0.3919911 0.1678641 0.519869 0.7932334

4. Close mpi


Note: The results may vary as the seed was not set for this code.

MPI Tutorial for R (Rmpi)

In the previous two posts, I introduced what MPI is and how to install MPI for R programing language. Rmpi provides an interface necessary to use MPI for parallel computing using R. Rmpi is maintained by Hao Yu at University of Western Ontario and it has been around for about a decade now. Although it doesn’t have all commands found in original MPI for C/Fortran, quite a few functions have been added and it has most of basic functions for normal operations. The manual for Rmpi is provided here.

In this post, I am going to cover a few basic commands/functions for MPI using R.

Spawning Slave CPUs

In MPI term, master is the main CPU that sends messages to dependent CPUs called slaves to complete some tasks . When you spawn slaves using mpi.spawn.Rslaves(), first it gets the number of available CPUs by default setting (depending on your system). You can use nslave option to define the specific number of CPUs you want to use for MPI. You can use higher number than actual CPUs available in your system, but you will not get any benefit from doing it. It behaves as if it has the number of CPUs, but actual computation is done by available CPUs.

Screen Shot 2013-11-23 at 11.48.54 AM

Lets Execute A Command Using Slaves

There are several commands to execute codes in slaves. mpi.remote.exec() and mpi.bcast.cmd() are examples. The syntax for mpi.remote.exec() is

>mpi.remote.exec(cmd, …, simplify = TRUE, comm =1, ret =TRUE)

where cmd is a command to be executed on slaves, … is used as argument which will be used for the cmd, simplify is logical argument whether the results to be a dataframe if possible, comm is a communication number (usually 1), and ret is the logical value whether if you want results from executed code from slaves.  If you use  mpi.bcast.cmd() command to execute the following code, the slaves will execute the command but there will be no return values from them.

Let’s ask each slave to give back the slave number.

>mpi.remote.exec(paste("I am",mpi

[1] "I am 1 of 11"

[1] "I am 2 of 11"


[1] "I am 10 of 11"

As you can see mpi.comm.rank() and mpi.comm.size() give the slave CPU number and total size of spawned slaves. The diagram below shows how this command is executed.

Another example: asking each slave CPUs to compute the sum of 1 through their rank.

> mpi.remote.exec(sum(1:mpi.comm.rank()))
  X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1  1  3  6 10 15 21 28 36 45  55

Measure Time to Compute

To see if your codes need to be paralleled, one can measure the time to compute the task. In R, proc.time() command returns three values and you can use this function to determine the time to compute.
1) user time: the CPU time charged for the execution of the user instructions of the calling process
2) system time: the CPU time charged for execution by the system on behalf of the calling process
3) elapsed time: the time since you logged in current account

Scalability Is Important

Increasing the number of CPUs doesn’t necessarily increase the performance. The overhead, an extra time needed to access the CPUs will increase with more CPUs for parallel computing.  Here is an example of the performance  of a simple code which computes the mean of 1million random numbers for 400 times. The performance increases dramatically from 2 to 4 CPUs (I), then the performance increases more slowly from 4 to 15 CPUs (II). Using More than 16 CPUs  takes more time to compute than 15 CPUs (III) and had no benefit of  doing so. Note: this code was run under sub-optimal interconnect network to show the effect of overhead. Results may vary dependent on your system. Under optimal condition, the time to compute should be halved if you double the number of CPUs.

mpi.iparReplicate(400, mean(rnorm(1000000)))
print(proc.time() - ptm)


If you use a large number of CPUs for computation, the overhead may significantly affect the overall performance.  So it is important to test your scripts on different numbers of CPUs for the optimal performance. The figure below is the actual performance by reserving 24 CPUs from a large computer cluster. All 24CPUs have high speed interconnect network, therefore performance doubles when number of CPUs doubled (e.g. 3->6 or 4->8). However using more than 10CPUs has no benefit of doing so.


The commands I covered in this posts are all corrective call, which means that all slaves in a communicator are called for execution. I would like to cover more MPI commands to control individual slave in another post. 

There are three more commands before finishing today’s post. These are mpi.finalize(), mpi.exit() and mpi.quit(). mpi.finalize() should be called to clean all MPI states at the end of the script.  mpi.exit() will not only call mpi.finilize() but also detach the Rmpi library. If mpi.exit() is called, you need to relaunch R to load Rmpi library. mpi.quit() will quit MPI and leave R altogether.

Installing Rmpi (MPI for R) on Mac and Windows

MPI is message passing interface for parallel computing I described in the previous post. MPI is usually written in C or Fortran. Fortunately, you don’t need to know these programing language to use MPI. In this post, I am going to show step-by-step how to install MPI to use in R.

Installing Rmpi on Mac

Rmpi is a package to run MPI in R. Assuming you have already installed R, this site tells you how to install Rmpi for Mac OS X. There are several steps for this.

1) Get X codes, Command line tools, and home brew. Homebrew can manually install non-apple programs. If your system doesn’t have these, you need to spend some time to install them. Make sure you see the message below before next step.

Your system is ready to brew.

If you are running OS X 10.6.8 or earlier, the system may not be compatible with most recent X code. In this case, I recommend to upgrade your system to Mavericks, which is free (right now at least). Then you can install X code followed by command line tool for Mavericks.

Screen Shot 2013-11-17 at 9.04.36 PM

Once command line tool is installed, install home brew the same way as described there (step 2 and 3).

2) install gfortran and open-mpi

This part is pretty simple. Open command line window, then type the command to install gfortran and openMPI.

>brew install gfortran
>brew install open-mpi

3) install rmpi package in R
First download rmpi. Then install it on R64 (use local package directory for installation).

> install.packages("Rmpi",type="source")
Installing package(s) into ‘/Library/Frameworks

If you have trouble compiling….. your compiler may be obsolete. When I installed Maverics on my old macbook, my compiler couldn’t compile from Rmpi source package. I needed to install macports for Mavericks, and then re-install open mpi.

If still doesn’t work, try earlier version of Rmpi  (e.g. 0.6-3) from here. Place the tgz file on your desktop, then in command line type “R”. Then type

>install.packages(file.choose(), repos = NULL, 
type = "source")

4) load rmpi in R


Installing Rmpi on Windows

Installation of Rmpi for Windows is once all requirements are met. It is necessary to install MPICH2 program to host parallel computing.

1) Install MPICH2 from here (select unofficial binary package for windows)
2) Change the PATH
3) Start hosting MPI
4) Download Rmpi and install on R
5) Start using Rmpi

What I am showing here is very similar to this website, but I am adding a little more information for each step.
If installation of MPICH2 is successful, you should see MPICH2 folder in the program files.


2) To change PATH for MPICH2, right click “my computer” then select “property”. Click “Advanced system setting” on the left. Click “Environmental Variable”, then highlight “PATH”, click edit. In the small window, you will add “;C:\Program Files\MPICH2\bin” at the end of character string which is already there. Hit OK.

environmental varialble change3) To start hosting, you need to run the following command in administrator, to do in Windows 7 or Vista, go to Run on the menu and type cmd, then press CTL + SHIFT + ENTER. You will see a little window like this.


Then type the following command.

>smpd -install
>mpiexec -remove

It will remove existing MPICH2 (if any) and install the program. -remove option will remove existing accounts and passwords.
Now, you are going to register an account with a password with -register option

>mpiexec -register

It will ask you to type account name and password.You need a valid user name and password for the windows. Otherwise it will fail.
Then using -validate option to test if this process is successful. You should see SUCCESS.

4) Download Rmpi package here.  Select the most recent version at the bottom. Then go to R, and install package from package archive (using the downloaded zip file). Click brows and select the file you downloaded.

install rmpi on R

5) Load Rmpi package. Now it is ready to run MPI.


Testing If Rmpi Is Running Successfully

Let’s see you set up Rmpi correctly. Try typing the following:

> mpi.spawn.Rslaves()

It may take a few moments before it returns, however if it takes too long there is something wrong with the installation process. For desktop/laptop computer, the number of slaves spawn is the number of threads you have in your machine.

In the next post, I will discuss more about how to use the MPI installed on R.

Disclaimer: Installation process described here was tested on several computers both windows and macs, however, I cannot guarantee that the processes described above works for all computers. I am not responsible for anything that happens to your computer, please backup all your data before you try.

Parallel Computing: Introduction to MPI

What is MPI?

MPI stands for message passing interface, which enables parallel computing by sending codes to multiple processors. Basically, MPI is a bunch of codes which are usually written in C or Fortran and makes possible to run program with multiple processors. But there are several infrastructures for memory & multiple-CPUs. Most of desktop/laptop computers are multi-core (meaning multiple CPUs) with shared memory these days.


In this model, each CPU has an access to shared memory, so you can place a data set in the shared memory and divide the work to multiple CPUs. To run a program for the tasks using this kind of shared memory model, you can use OpenMP (different from OpenMPI). I am not going to discuss with OpenMP here, but maybe in the future posts.

Another type of CPU-memory infrastructure is distributed-memory model. In this model, each CPU has own memory and other CPUs cannot access directly to it.


Advantages of distributed memory model is

1) CPUs don’t have to race, no waiting or synchronization is necessary.
2) Address of memory can be unified, easier to keep track address space.
3) Easier to design the machine

However, cluster computers were designed more like hybrid structure, meaning each node has shared memory structure but between nodes, memory is not shared and not accessible.


Since I am more interested high-speed computing using cluster computer, MPI is the way to go for implementation of parallel computing.

What is OpenMPI?

MPI was originally developed by researchers from both academic and industry to standardize the portable message passing system. OpenMPI project is open source freely available implementation for distributed memory model and their software is completely free to use (unless you are trying to sell programs which use openMPI)!

Other MPIs

There were three MPIs developed by different groups. FT-MPI by University of Tennessee, LA-MPI by Los Alamos National Laboratory and LAM/MPI by Indiana University. Each MPI has its unique feature, and openMPI evolved by taking the best of each MPI and now it is updated much more frequently than these three MPIs and has become standard implementation for MPI.

How can I use it?

MPI is written in C or Fortran, its library is made up of ~200 routine functions. Fortunately, the library can be used in many languages such as C/C++, Fortran, Java, Python, MATLAB and R. The details of implementation of MPI is written in OpenMPI website @

Simply download MPI for each programming language and install on your computer. You need   compilers for C/C++  or Fortran. If you are using Mac or Linux, simply configure, make and make install. For windows, use cygwin and install like linux environment.

It is the best to install on your local computer (desktop/laptop), then test your codes on it first before using on cluster because you can execute MPI codes for multiple threads on a single core computer. Debugging can be more straight forward.

In the next post, I will demonstrate installation and running MPI using R.

%d bloggers like this: