In this section I will lay the foundation for an MPI version of a
diffusion code based on Jacobi iterations. The idea is that a flat
rectangular region, whose dimensions in this token example are going
to be
,
will be divided amongst 12 processes, each handling a small
square
.
But because it is a differential problem, apart from the
data squares, the processes will also have to maintain additional data
that is going to mirror data that corresponds to whatever the
neighbouring processes have in their
squares.
Diffusion is a very universal phenomenon that is encountered almost everywhere, beginning with microscopic world of quantum systems, and going all the way up to atmospheric circulation, pollution, and hurricanes. In the middle you encounter diffusion of nutrients through various membranes in cells or diffusion of oil through geological strata, or diffusion of heat through a combustion engine.Mathematics of diffusion in all these systems is very similar and, at its simplest, it is described by the following partial differential equation:
![]()
where T(x, y, z) represents the diffusing quantity (e.g., it may stand for temperature), t stands for time, D is the diffusion coefficient andis the Laplace operator, which looks like this in Cartesian coordinates:
![]()
Assuming that the problem is static, we get thatand then
too. This is called the Laplace equation.We can simplify the problem further, by assuming that it is only 2-dimensional, so that
![]()
This equation has the following discretized equivalent on a regular rectangular lattice with spacing h for any lattice node (x0, y0):
In other words, the value of T at (x0, y0) is the arithmetic average of T over the neighbours nearest of (x0, y0) (we don't take the diagonal neighbours into account, because they are not the nearest).This provides us with a simple method of solving equation (5.2) for a situation with fixed boundary conditions, i.e., we know T everywhere on the boundary of some 2-dimensional region and we want to find T everywhere within the region. The solution is to begin with an initial guess. Then use equation (5.3) to replace each T(x0, y0)with the average over the neighbours of (x0, y0). We may have to repeat this process over and over, until we'll notice that T no longer changes from iteration to iteration. This is going to be our numerical solution of equation (5.2). This very simple method is due to a German mathematician Karl Gustav Jacob Jacobi, who was born on the 10th of December 1804 in Potsdam near Berlin, and who died on the 18th of February 1851 in Berlin.
In order to carry out Jacobi iterations on a parallel system, you need to divide the two-dimensional region, in the simplest case a rectangle, into, say, squares, allocating each square to a separate process. For nodes on the boundaries of the squares, we will need to know values that function T assumes on nodes that belong to other processes, so that we can perform the averaging.
This is where MPI communication comes in. We will have to use MPI in
order to transfer these values. We only need to transfer values
associated with the boundaries of the squares. If each square
comprises
nodes, then a process responsible for a given
square will have to send
data items to other nodes for
each iteration. The amount of data that needs to be communicated
scales like N, whereas the amount of data that stays within the
square and can be computed on without communication scales like
N2. Assuming that computation is a thousand times faster than
communication, we get a ``break-even'' for
,
which yields N=1000. For N much larger than 1000 the cost of
communication will be negligible compared to the cost of computation.
This means that the squares need to be very large. The faster the CPUs
compared to the network, the more nodes you have to pack into each square
to make the parallelization worthwhile.
The communication in this case can be facilitated further, if the processes responsible for adjacent squares can be made to run on ``adjacent'' processors. There are machines, such as, e.g., Cray X1 , whose internal network topology lets programmers do just this. Such machines are exceptionally well suited to solving field problems, of which diffusion is the simplest.
In this section you will learn how MPI provides various constructs that are very helpful in capturing this type of problem. But we are not going to solve the problem itself. Instead we will focus on the MPI mechanisms themselves and we will analyze a simple MPI demonstration program that explores these mechanisms.
In our simple example each process looks after an small
integer matrix,
which represents the ``square'' and the integers represent function T.
We are going to surround each
square with additional rows and columns
around its boundary, into which we are going to transfer data from the adjacent
squares. There will be a column on the left and on the right of the
region to represent values obtained from adjacent squares on the left and on the right
hand side, and rows above and below the
region, to represent values obtained
from adjacent squares above and below ``our'' square. So, in effect, each process
will really have to work with a
matrix, but only
interior
of this matrix is its own, and the periphery of the matrix will be populated with
data obtained from other processes.
Within the example code we are going to:
At every stage the master process is going to collect data from all other processes and print it on standard output so as to show the state of the system at one glance.
In order to help you understand the whole procedure, I'm going to show you some of the output of the program first, then I'll list the program for you, and then discuss it in more detail.
When the program starts, right after the processes have formed the new Cartesian topology, oriented themselves within it, but before they exchanged any data, their state looks as follows:
9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2Every process initializes its own
Now, the processes exchange the content of their inner rows with their neighbours, i.e., process rank 4 sends the content of its row 3 (counted from the bottom) to process number 7, which puts it in its own row 1 (counted from the bottom), and, at the same time receives the content of row 2 from process rank 7, and places it in its own row 4.
So that after this exchange operation has taken place, the matrices look as follows:
9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2Now we have to repeat this operation, but this time exchanging columns between neighbours. And so process rank 4 will send its column 2 (counted from the left) to process rank 3, which is going to place it in its own column 4 (counted from the left), and, at the same time process rank 3 is going to send its own column 3 to process rank 4, which is going to place it in its own column 1.
And after all this is over, the matrices are going to look as follows:
9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2Observe that now not only does every process know what data is harboured by its neighbours on the left, on the right, above, and below, but they even know the data that somehow made it diagonally, i.e., from the upper left, upper right, lower left, and lower right directions - but the latter is not going to last, because that data came from the peripheral rows and columns in the adjacent squares, so it is not truly representative of the processes' internal states.
Now every process can perform its own Jacobi iteration within its own little patch for a diffusion problem, and set new values to its own data, while keeping the mirrored data unchanged.
Before the next iteration can commence, the data has to be exchanged between the neighbours again, in order to refresh the peripheries of the squares.
So now let us have a look at the code: