next up previous index
Next: Caring, Really Caring About Up: A Lawyer's View of Previous: Fortran and Gnuplot: the

   
Some Additional Comments about Algorithms

In this section we had a close look at two broad classes of algorithms.

The first class is represented by algorithms we've implemented in order to evaluate $a \pm \sigma_a$, $b \pm \sigma_b$, $\chi ^2$and y' = a + b x. These computations are outlined in Figure 2.10 (page [*]). Their characteristic feature is the use of array operations and a total lack of explicit iterations, i.e., no DO loop, even though in other languages, like C or Lisp, we would have to use many iterative constructs in order to perform the same computations. These operations are explicitly parallel. This means that if we were to run them on a parallel computer then

1.
a suitably clever Fortran compiler should be able to distribute them automatically over multiple processors - this, in fact, is what a High Performance Fortran compiler does
2.
the program could run in parallel on such a machine speeding up computations many times on a sufficiently large problem, i.e., a problem with very large arrays
The cost of exchanging data between processors of a parallel computer is usually quite high and it can be offset only if portions of work assigned to every individual processor are sufficiently large.
There is no point whatsoever in parallelising small programs or programs working on small data sets. The effort involved in parallelising such programs would never pay off.

The second class is represented by algorithms we've used to calculate incomplete gamma functions. You can see these algorithms implemented in Figures 2.12 (page [*]) and 2.13 (page [*]). In both cases the algorithms are recursive, i.e., we have to enter a DO loop within which we compute successive terms using terms evaluated in previous iterations. Such algorithms are basically sequential. They cannot be parallelised, although small optimisations may be possible sometimes. The only way to make them run fast, is to have a very fast sequential processor that is clocked at a very high clock rate.

In case of the algorithm shown in Firgure 2.13 on page [*] a superscalar processor or an explicitly parallel instruction set CPU such as Merced can speed up things just a little bit. For example a0 in the first line of the do loop can be evaluated at the same time as b0 in the second line. Similarly lines 3 and 4 of the do loop can be evaluated simultaneously too. Lines 5 through 7 must then be evaluated in sequence after we have calculated b1 in line 4. But the last, 8th, line, which increments n can be evaluated at the same time as lines 5 through 7. In summary, instead of 8 operations per loop, we can do it in 5 parallel steps. The speed up could be of the order of 1.5 to 2 against a purely sequential processor.

There are other things that you can optimise when developing algorithms of this kind. You should try to minimise data movement in the first place. For example if you have an auxiliary variable, which is used to memorise an old value of some quantity to be then used in a next iteration, make that variable reside on the CPU itself, in a register. A good compiler should make such optimisations automatically. Sometimes you may have to use special compiler directives.

But a more important optimisation is in the choice of an algorithm itself: use an algorithm that converges quickly, so that you don't have to run many iterations. Continued fraction expansions discussed in Section 2.2.3 (page [*]) are often very effective in this respect. But you must watch for regions of validity: make sure that you don't divide by zero, nor do you get close to a division by zero. You can usually switch to a safer procedure in such a region, as we have done in function goodness (cf. Figure 2.11, page [*]).


next up previous index
Next: Caring, Really Caring About Up: A Lawyer's View of Previous: Fortran and Gnuplot: the
Zdzislaw Meglicki
2001-02-26