Diffusion

In this section I will lay the foundation for an MPI version of a diffusion code based on Jacobi iterations. The idea is that a flat rectangular region, whose dimensions in this token example are going to be , will be divided amongst 12 processes, each handling a small square . But because it is a differential problem, apart from the data squares, the processes will also have to maintain additional data that is going to mirror data that corresponds to whatever the neighbouring processes have in their squares.

Diffusion is a very universal phenomenon that is encountered almost everywhere, beginning with microscopic world of quantum systems, and going all the way up to atmospheric circulation, pollution, and hurricanes. In the middle you encounter diffusion of nutrients through various membranes in cells or diffusion of oil through geological strata, or diffusion of heat through a combustion engine.Mathematics of diffusion in all these systems is very similar and, at its simplest, it is described by the following partial differential equation:

whereT(x,y,z) represents the diffusing quantity (e.g., it may stand fortemperature),tstands fortime,Dis thediffusion coefficientand is theLaplace operator, which looks like this in Cartesian coordinates:

Assuming that the problem is static, we get that and then

too. This is called theLaplace equation.We can simplify the problem further, by assuming that it is only 2-dimensional, so that

This equation has the following discretized equivalent on a regular rectangular lattice with spacinghfor any lattice node (x_{0},y_{0}):

In other words, the value ofTat (x_{0},y_{0}) is the arithmetic average ofTover the neighbours nearest of (x_{0},y_{0}) (we don't take the diagonal neighbours into account, because they are not the nearest).This provides us with a simple method of solving equation (5.2) for a situation with fixed boundary conditions, i.e., we know

Teverywhere on the boundary of some 2-dimensional region and we want to findTeverywherewithinthe region. The solution is to begin with an initial guess. Then use equation (5.3) to replace eachT(x_{0},y_{0})with the average over the neighbours of (x_{0},y_{0}). We may have to repeat this process over and over, until we'll notice thatTno longer changes from iteration to iteration. This is going to be our numerical solution of equation (5.2). This very simple method is due to a German mathematician Karl Gustav Jacob Jacobi, who was born on the 10th of December 1804 in Potsdam near Berlin, and who died on the 18th of February 1851 in Berlin.

In order to carry out Jacobi iterations on a parallel system, you need
to divide the two-dimensional region, in the simplest case a
rectangle, into, say, squares, allocating each square to a separate
process. For nodes on the boundaries of the squares, we will need to
know values that function *T* assumes on nodes that belong to other
processes, so that we can perform the averaging.

This is where MPI communication comes in. We will have to use MPI in
order to transfer these values. We only need to transfer values
associated with the boundaries of the squares. If each square
comprises
nodes, then a process responsible for a given
square will have to send
data items to other nodes for
each iteration. The amount of data that needs to be communicated
scales like *N*, whereas the amount of data that stays within the
square and can be computed on without communication scales like
*N*^{2}. Assuming that computation is a thousand times faster than
communication, we get a ``break-even'' for
,
which yields *N*=1000. For *N* much larger than 1000 the cost of
communication will be negligible compared to the cost of computation.
This means that the squares need to be very large. The faster the CPUs
compared to the network, the more nodes you have to pack into each square
to make the parallelization worthwhile.

The communication in this case can be facilitated further, if the processes responsible for adjacent squares can be made to run on ``adjacent'' processors. There are machines, such as, e.g., Cray X1 , whose internal network topology lets programmers do just this. Such machines are exceptionally well suited to solving field problems, of which diffusion is the simplest.

In this section you will learn how MPI provides various constructs that are very helpful in capturing this type of problem. But we are not going to solve the problem itself. Instead we will focus on the MPI mechanisms themselves and we will analyze a simple MPI demonstration program that explores these mechanisms.

In our simple example each process looks after an small
integer matrix,
which represents the ``square'' and the integers represent function *T*.
We are going to surround each
square with additional rows and columns
around its boundary, into which we are going to transfer data from the adjacent
squares. There will be a column on the left and on the right of the region to represent values obtained from adjacent squares on the left and on the right
hand side, and rows above and below the
region, to represent values obtained
from adjacent squares above and below ``our'' square. So, in effect, each process
will really have to work with a
matrix, but only
interior
of this matrix is its own, and the periphery of the matrix will be populated with
data obtained from other processes.

Within the example code we are going to:

- 1.
- arrange processes into a
*virtual*2-dimensional Cartesian process topology - 2.
- orient processes within this new topology (so that they know who their neighbours are)
- 3.
- exchange data with neighbouring processes within the Cartesian topology so as to populate the peripheral rows and columns in the matrices.

At every stage the master process is going to collect data from all other processes and print it on standard output so as to show the state of the system at one glance.

In order to help you understand the whole procedure, I'm going to show you some of the output of the program first, then I'll list the program for you, and then discuss it in more detail.

When the program starts, right after the processes have formed the new Cartesian topology, oriented themselves within it, but before they exchanged any data, their state looks as follows:

9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2Every process initializes its own matrix to its own rank number. The matrices are displayed by the master process in a way that illustrates the topology of the whole system, i.e., we have a rectangular topology , process rank 0 is in the lower left corner, and its neighbours are process rank 1 on the right and process rank 3 above. Process rank 4 sits roughly in the middle and its neighbours are process rank 1 below, process rank 7 above, process rank 3 on the left, and process rank 5 on the right.

Now, the processes exchange the content of their inner rows with their neighbours, i.e., process rank 4 sends the content of its row 3 (counted from the bottom) to process number 7, which puts it in its own row 1 (counted from the bottom), and, at the same time receives the content of row 2 from process rank 7, and places it in its own row 4.

So that after this exchange operation has taken place, the matrices look as follows:

9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 6 6 6 6 7 7 7 7 8 8 8 8 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 3 3 3 3 4 4 4 4 5 5 5 5 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2Now we have to repeat this operation, but this time exchanging columns between neighbours. And so process rank 4 will send its column 2 (counted from the left) to process rank 3, which is going to place it in its own column 4 (counted from the left), and, at the same time process rank 3 is going to send its own column 3 to process rank 4, which is going to place it in its own column 1.

And after all this is over, the matrices are going to look as follows:

9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 9 9 9 10 9 10 10 11 10 11 11 11 6 6 6 7 6 7 7 8 7 8 8 8 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 6 6 6 7 6 7 7 8 7 8 8 8 3 3 3 4 3 4 4 5 4 5 5 5 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 3 3 3 4 3 4 4 5 4 5 5 5 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2 0 0 0 1 0 1 1 2 1 2 2 2Observe that now not only does every process know what data is harboured by its neighbours on the left, on the right, above, and below, but they even know the data that somehow made it diagonally, i.e., from the upper left, upper right, lower left, and lower right directions - but the latter is not going to last, because that data came from the peripheral rows and columns in the adjacent squares, so it is not truly representative of the processes' internal states.

Now every process can perform its own Jacobi iteration within its own little patch for a diffusion problem, and set new values to its own data, while keeping the mirrored data unchanged.

Before the next iteration can commence, the data has to be exchanged between the neighbours again, in order to refresh the peripheries of the squares.

So now let us have a look at the code: