Parallel Computing: Introduction to MPI
What is MPI?
MPI stands for message passing interface, which enables parallel computing by sending codes to multiple processors. Basically, MPI is a bunch of codes which are usually written in C or Fortran and makes possible to run program with multiple processors. But there are several infrastructures for memory & multiple-CPUs. Most of desktop/laptop computers are multi-core (meaning multiple CPUs) with shared memory these days.
In this model, each CPU has an access to shared memory, so you can place a data set in the shared memory and divide the work to multiple CPUs. To run a program for the tasks using this kind of shared memory model, you can use OpenMP (different from OpenMPI). I am not going to discuss with OpenMP here, but maybe in the future posts.
Another type of CPU-memory infrastructure is distributed-memory model. In this model, each CPU has own memory and other CPUs cannot access directly to it.
Advantages of distributed memory model is
1) CPUs don’t have to race, no waiting or synchronization is necessary.
2) Address of memory can be unified, easier to keep track address space.
3) Easier to design the machine
However, cluster computers were designed more like hybrid structure, meaning each node has shared memory structure but between nodes, memory is not shared and not accessible.
Since I am more interested high-speed computing using cluster computer, MPI is the way to go for implementation of parallel computing.
What is OpenMPI?
MPI was originally developed by researchers from both academic and industry to standardize the portable message passing system. OpenMPI project is open source freely available implementation for distributed memory model and their software is completely free to use (unless you are trying to sell programs which use openMPI)!
There were three MPIs developed by different groups. FT-MPI by University of Tennessee, LA-MPI by Los Alamos National Laboratory and LAM/MPI by Indiana University. Each MPI has its unique feature, and openMPI evolved by taking the best of each MPI and now it is updated much more frequently than these three MPIs and has become standard implementation for MPI.
How can I use it?
MPI is written in C or Fortran, its library is made up of ~200 routine functions. Fortunately, the library can be used in many languages such as C/C++, Fortran, Java, Python, MATLAB and R. The details of implementation of MPI is written in OpenMPI website @ http://www.open-mpi.org/
Simply download MPI for each programming language and install on your computer. You need compilers for C/C++ or Fortran. If you are using Mac or Linux, simply configure, make and make install. For windows, use cygwin and install like linux environment.
It is the best to install on your local computer (desktop/laptop), then test your codes on it first before using on cluster because you can execute MPI codes for multiple threads on a single core computer. Debugging can be more straight forward.
In the next post, I will demonstrate installation and running MPI using R.