The most direct way of generating random samples from a Poisson distribution is efficient for some parameters and inefficient for others. Wikipedia attributes the following algorithm to Donald Knuth:

init: Let L ← exp(−λ), k ← 0 and p ← 1. do: k ← k + 1. Generate uniform random number u in [0,1] and let p ← p × u. while p > L. return k − 1.

Here’s the idea behind the algorithm. The time between arrivals in a Poisson process is exponentially distributed. Count how many arrivals there are in an interval by simulating the times between arrivals and adding them up until the time sum spills over the interval.

The problem with this approach is that the expected run time of the loop is proportional to the parameter λ. This algorithm is commonly used despite its terrible performance for large arguments.

Below is an algorithm that has expected run time independent of the argument λ. The algorithm is fairly simple, though it takes a moderate amount of theoretical justification to explain. It goes back to 1979 and so may not the most efficient algorithm available. It is definitely not the most efficient algorithm if you need to generate a large number of samples using the same parameter λ. The paper describing the algorithm recommends using Knuth’s algorithm for values of λ less than 30 and this algorithm for larger values of λ.

c = 0.767 - 3.36/lambda beta = PI/sqrt(3.0*lambda) alpha = beta*lambda k = log(c) - lambda - log(beta) forever { u = random() x = (alpha - log((1.0 - u)/u))/beta n = floor(x + 0.5) if (n < 0) continue v = random() y = alpha - beta*x lhs = y + log(v/(1.0 + exp(y))^2) rhs = k + n*log(lambda) - log(n!) if (lhs <= rhs) return n }

This is an accept-reject method based on a normal approximation. The performance improves as λ increases because the quality of the normal approximation improves. (Actually, it uses a logistic approximation to a normal approximation!) Theoretically it could be caught in an infinite loop, as with all accept-reject methods. However, the expected number of iterations is bounded. The `continue`

statement means that if n is negative, the algorithm goes back to the top of the loop. The `random()`

function is assumed to return a uniform random variable between 0 and 1.

A possible obstacle to turning the pseudocode above into actual code is computing log(n!). Naively computing n! and taking the log of the result will overflow for moderately large values of n. If you have a function available that computes the log of the gamma function, such as `lgamma`

in C’s math.h, then evaluate that function at n+1. Otherwise, see How to compute log factorial.

Update: See Stand-alone code for numerical computing for an implementation of the pseudocode here.

The algorithm above is “method PA” from “The Computer Generation of Poisson Random Variables” by A. C. Atkinson, Journal of the Royal Statistical Society Series C (Applied Statistics) Vol. 28, No. 1. (1979), pages 29-35. Method PA is on page 32.

It’s important to test your implementation by drawing a lot (say, 10^6 or 10^8) of variates and checking the resulting moments and histogram. You can get error bars for both — a quantitative comparison is better than “by eye” because you can be fooled into thinking small deviations are OK if you don’t have the error bars. The histogram variances will be p * (1-p) /n where n is the number of samples.

I transcribed some methods for similar discrete PMF’s from Devroye’s book, and found some errors of both types — transcription errors on my part (easily fixed) and problems in the underlying algorithm as printed in the book. Some of these (e.g. for his gamma generator, which he calls XG) are covered in Devroye’s online errata, which is a great service to the community. Some not, e.g. his recommended Poisson generator. I ended up using algorithm PTRS from:

http://statmath.wu.ac.at/staff/hoermann/publications.html

combined, at low lambda, with the trivial generator you mentioned.

Actually:

:)

I didn’t think about the possibility that u could be zero because I’m accustomed to using generators that generate values from the

openinterval (0, 1).But some generators will possibly return the boundary points 0 or 1 and in that case you do indeed need to check u != o. You also have to check u != 1 since otherwise the argument to

`log`

could be 0.I don’t understand why generators include one or both boundary points. These points are rare and so leaving them out causes no harm. Leaving them in means extra logic in much of the code that uses the generator.

I agree that Knuths algorithm should not be used for large values of Lambda, partially because of issues with exp(-Lambda) being excessively small.

What about over-dispersed Poisson?

Really useful post, I tried to list all the different Poisson simulation algorithms used by various programs and code libraries here http://www.evolvedmicrobe.com/Blog.html if anyone wants to know who uses which algorithms.

I just use the cumulative distribution function as a ‘look up table’ from a uniform random variable. Perhaps I’m missing something!

I agree that Knuths algorithm should not be used for large values of Lambda, partially because of issues with exp(-Lambda) being excessively small.

Just a thought: you could apply a log() to variables L and p in the initial pseudocode, i.e.:

– initialize L to −λ (instead of exp(−λ)) and p to 0 (instead of 1)

– update p by adding the log(u), instead of multiplying by u

The cost is taking many logs of your uniform-random variables (u), but it should fix many accuracy problems..

Thanks for the post!

Actually, I’m doing an implementation of both methods and I found slight differences between the algorithm mentioned foregoing and the one proposed in my assignment. Both of them come from the same work so I really don’t understand why one works, but not the other one.

Anyway, now it works – thanks!