next up previous index
Next: More About the Goodness Up: The Goodness of Fit Previous: Definition of

A Yet Another Shot at $\Gamma (a)$

So far we have used the implementation of the Euler gamma function, $\Gamma (a)$, provided with the Solaris version of Fortran (actually, its logarithm). This makes the code non-portable. For example, we couldn't find an equivalent of Euler gamma function in IRIX-6.3 Fortran. Even if we could, the name of the subroutine could well be different.

But the world doesn't end on Solaris. Today, as well as in future, you may have to work with a variety of operating systems and hardware architectures: VMS is still around, so is MVS, and the latter is likely to go on for many years to come.

NT  for all practical purposes is restricted to just one hardware architecture and aimed specially at small offices and enterprises. This makes it an unattractive system to work with, if you are a scientist or an engineer, because neither Intel hardware nor Microsoft software are particularly sympathetic towards the needs of scientists and engineers, who represent a niche market. But for one reason or another you may come across those systems in your professional career too. In may institutions and companies clerks and bureaucrats decide about institutional purchases, and, like everybody else, they are likely to be swayed by salesman pitch, propaganda, and whatever box they happen to have sitting on their own desk. Once upon a time Intel boxes used to be markedly cheaper than UNIX workstations. Today this is no longer the case, but many people haven't woken up to this fact yet.

Macs are coming back, and many scientists and engineers like them, because they are easy to use, even though they are just as clumsy and inflexible when it comes to doing science as NT.

Finally, we can't exclude that new architectures and operating systems may appear in future. None of the stuff that we see on the market today is ideal, and there is no reason why people shouldn't work on better hardware architectures and better operating systems.

Anyhow, if you want your Fortran program to compile and work seamlessly across all platforms, you should provide your own definitions for all functions and subroutines called by the program that are not a part of ANSI Fortran.

But there is also another reason to do so. The Euler gamma function is a very complicated function with poles at 0 and all negative integers, defined in the whole complex plane (with the exception of the poles, that is). There are many expressions that approximate the gamma function in various regimes quite well, or sometimes extraordinarily well. A general purpose gamma function provided, for example, with NAG or IMSL libraries, may have numerous branches and invoke various evaluation techniques depending on the argument. If you have to call such a gamma function very frequently, the cost of doing so may be quite high. So, if you know that your arguments are going to be restricted to a certain narrow range, you may be better off writing your own lightweight implementation.

In the case of our program, we really want an inverse of the gamma function and not the gamma function itself. The inverse is a lot easier to calculate, because it is a very regular function without any poles and with only a very moderate ``pulsing'' growth for x < 0.

There are many ways to calculate $1/\Gamma(z)$. For example we could use Euler's infinite product

\begin{displaymath}\frac{1}{\Gamma(z)} = z e^{\gamma z} \prod_{n-1}^\infty
\left( 1 + \frac{z}{n} \right)
e^{-z/n},
\end{displaymath} (2.35)

where

\begin{displaymath}\gamma = \lim_{m\to\infty}
\left( 1 + \frac{1}{2} + \frac{1...
... \cdots + \frac{1}{m} - \ln{m}
\right) = 0.57721\,56649\ldots
\end{displaymath} (2.36)

A Taylor expansion of this formula around z = 0 can be found in ``Tables of Higher Mathematical Functions'', by H. T. Davis (later corrected by H. E. Salzer), published here in Bloomington, Indiana, in 1933 and in 1935. The expansion looks as follows:

 \begin{displaymath}
\frac{1}{\Gamma(z)} = \sum_{k=1}^{\infty} c_k z^k
\end{displaymath} (2.37)

where the first 26 coefficients ck are given by:
c1 = 1 . $00000\,00000\,000000$
c2 = 0 . $57721\,56649\,015329$
c3 = -0 . $65587\,80715\,202538$
c4 = -0 . $04200\,26350\,340952$
c5 = 0 . $16653\,86113\,822915$
c6 = -0 . $04219\,77345\,555443$
c7 = -0 . $00962\,19715\,278770$
c8 = 0 . $00721\,89432\,466630$
c9 = -0 . $00116\,51675\,918591$
c10 = -0 . $00021\,52416\,741149$
c11 = 0 . $00012\,80502\,823882$
c12 = -0 . $00002\,01348\,547807$
c13 = -0 . $00000\,12504\,934821$
c14 = 0 . $00000\,11330\,272320$
c15 = -0 . $00000\,02056\,338417$
c16 = 0 . $00000\,00061\,160950$
c17 = 0 . $00000\,00050\,020075$
c18 = -0 . $00000\,00011\,812746$
c19 = 0 . $00000\,00001\,043427$
c20 = 0 . $00000\,00000\,077823$
c21 = -0 . $00000\,00000\,036968$
c22 = 0 . $00000\,00000\,005100$
c23 = -0 . $00000\,00000\,000206$
c24 = -0 . $00000\,00000\,000054$
c25 = 0 . $00000\,00000\,000014$
c26 = 0 . $00000\,00000\,000001$
This expansion with 26 terms yields 7 digits accuracy for $z \in [0, 2.5]$, which, since in our case z = (N-2)/2, would cover us up to N = 2.5 * 2 + 2 = 7. But for z > 3.3 the expansion 2.37 with 26 terms only begins to diverge quite rapidly, and using more ck terms, would make it increasingly costly. We could use the recurrence formula

\begin{displaymath}\frac{1}{\Gamma(z+1)} = \frac{1}{z\Gamma(z)}
\end{displaymath} (2.38)

to move beyond, say, z = 2 though, and it is left to you as an exercise to implement this approach, perhaps making use of the code shown in Figure 2.17.

The Bloomington expansion may well have been someone's master's dissertation. Recall that in those days all computations would have to be done by hand, or, at best, by using a mechanical calculator that could do additions and subtractions quite well, but was rather cumbersome when it came to multiplications and divisions. For example, to multiply a number by 3, you would have to turn its knob three times, and to divide by 3, you would have to turn the knob three times in the other direction.

Figure 2.17 shows an implementation of this function.


  
Figure 2.17: Function one_by_gamma, which implements the Bloomington approximation by H. T. Davis
\begin{figure}\begin{footnotesize}
\begin{tex2html_preform}\begin{verbatim}real(...
...n one_by_gamma\end{verbatim}\end{tex2html_preform}\end{footnotesize}\end{figure}

We will not use the Bloomington expansion in our computations, but the program shown in Figure 2.17 illustrates the use of array constructors  in Fortran. You can either have a verbatim constructor, which, for an integer array, k, of dimension 5, say, could look as follows:

k = (/ 1, 2, 3, 4, 5 /)
This statement assigns 1 to k(1), 2 to k(2), and so on until 5, which is assigned to k(5).

The same effect can be accomplished  by using the so called ``implied DO'' array constructor:

k = (/ (i, i=1, 5) /)

Lisp programmers should recognise Fortran array constructors as related to array constructors in Common Lisp , although markedly less verbose. In Common Lisp we could say, for example

>(make-array 5 :initial-contents '(1 2 3 4 5))
#(1 2 3 4 5)

>(make-array 5 :initial-contents 
   (do ((mylist nil)
        (i 1 (1+ i)))
       ((> i 5) mylist)
       (setq mylist (append mylist (list i)))))
#(1 2 3 4 5)

>
You should appreciate how much faster and cleaner k = (/ (i, i=1, 5) /) is compared to the Lisp (do ...) form. On the other hand the Lisp way has a great flexibility about it: you can insert various Lisp forms including (do ...) basically anywhere within a function call.

Press, Flannery, Teukolsky, and Vetterling recommend a formula by Lanczos , which is a lot cheaper to compute than the Bloomington expansion and which is a lot more accurate too. The formula looks as follows:

$\displaystyle \Gamma(z + 1)$ = $\displaystyle \left(z + \gamma + \frac{1}{2} \right)^{z + 1/2}
e^{- (z + \gamma + 1/2)}$  
  $\textstyle \times$ $\displaystyle \sqrt{2\pi}
\left(
c_0
+ \frac{c_1}{z + 1}
+ \frac{c_2}{z + 2}
+ \cdots
+ \frac{c_n}{z + n}
+ \epsilon
\right)$ (2.39)

The Lanczos formula is valid for z > 0 only, which suits us just fine, because this implies $N_{\textit{\scriptsize min}} = 2 \cdot 1 + 2
= 4$ measurements in the $\chi ^2$ fit. You would usually have at least 6, and preferably much more than that.

For $\gamma = 5$, n = 6 (in the Lanczos formula), and for a specific choice of ck (see the code in Figure 2.18), we get that $\vert \epsilon \vert < 2 \times 10^{-10}$ everywhere in the half-complex plane Re(z) > 0.

Figure 2.18 shows the implementation of this formula as a Fortran function my_gamma.


  
Figure 2.18: Function my_gamma, which implements Lanczos formula for the Euler gamma function in Fortran
\begin{figure}\begin{footnotesize}
\begin{tex2html_preform}\begin{verbatim}real(...
...ction my_gamma\end{verbatim}\end{tex2html_preform}\end{footnotesize}\end{figure}

Although it is common to calculate $\pi$ on entry to a subroutine by calling $2\arccos{0.0}$ (which in Fortran  would be 2*acos(0.0)), this is costly, especially if the function has to be invoked many times. It is easy enough to evaluate $\sqrt{2\pi}$ with desired accuracy using, for example, Maxima:

(C2) ev(sqrt(2 * %pi), numer);
(D2)                          2.5066282746310002
(C3)
You can also invoke Octave , which we haven't talked about yet, but which, together with Matlab, we'll begin using quite soon. Octave is a program somewhat similar to Maxima, but specialising in numerical mathematics rather than in symbolic mathematics. In Octave you would calculate $\sqrt{2\pi}$ as follows:
octave:4> printf("%20.16f\n", sqrt(2*pi));
  2.5066282746310002
octave:5>

We can test our implementation of $\Gamma (a)$ against the Solaris implementation by running the following simple Fortran program:

program try

  use euler

  integer, parameter :: n = 25
  real (kind=long), parameter :: x_zero = 1.0_long
  real (kind=long) :: x
  integer :: i

  do i = 1, n
     x = x_zero + 0.4_long * i
     write(*, '(1f6.1, 2e20.10)') x, gamma(x), my_gamma(x)
  end do

end program try

Figure 2.19 shows the result of the run.


  
Figure: The output generated by program try, which compares the Lanczos version of the Euler gamma function against the Solaris version. The first column corresponds to x, the second column corresponds to $\Gamma (x)$ as calculated by Solaris, the third column corresponds to the Lanczos version of $\Gamma (x)$.
\begin{figure}\begin{footnotesize}
\begin{tex2html_preform}\begin{verbatim}gusta...
...6:54:24 !524 $\end{verbatim}\end{tex2html_preform}\end{footnotesize}\end{figure}

As you see the resulting agreement is very good down to 10 significant digits.

Function my_gamma illustrated in Figure 2.18 is not optimal, because it involves two exponentiations and three multiplications. If we were to evaluate $\log{\Gamma(a)}$ instead of $\Gamma (a)$itself, we could save some time by replacing logarithms of multiplications with additions of logarithms and exponentiations with multiplications.

Figure 2.20 shows the implementation of a function that calculates $\log{\Gamma(a)}$.


  
Figure: Implementation of $\log{\Gamma(z)}$ in Fortran using Lanczos' formula
\begin{figure}\begin{footnotesize}
\begin{tex2html_preform}\begin{verbatim}real(...
...tion my_lgamma\end{verbatim}\end{tex2html_preform}\end{footnotesize}\end{figure}

The following simple test program can be used to time calls to my_gamma versus calls to my_lgamma:

program try

  use euler

  integer, parameter :: n = 1000000
  real (kind=long), parameter :: x_zero = 1.0_long
  real (kind=long) :: x, throw_away
  integer :: i

  do i = 1, n
     x = x_zero + 0.00001_long * i
!     throw_away = exp(my_lgamma(x))
     throw_away = my_gamma(x)
  end do

end program try
This program calls either my_gamma or my_lgamma million times. The $\log{\Gamma(z)}$ version takes 8.7s CPU time on an Ultra-1 workstation, whereas the $\Gamma(z)$ version takes 10.5s. Alas! If you use this same program to test Solaris' d_lgamma $\ldots$ it takes only 1.4s, sic!

This level of optimisation can be achieved only by coding the function directly in the assembler or by utilising its hardware implementation. On many UNIX systems, especially the ones aimed directly at scientific and engineering markets, this may frequently be the case.

Similarly, many window operations are assembler-level or even hardware-level optimised under NT and MacOS.


next up previous index
Next: More About the Goodness Up: The Goodness of Fit Previous: Definition of
Zdzislaw Meglicki
2001-02-26