Last update: August 12, 2011 03:54:07 PM E-mail Print



Dr P. G Marais

Grootfontein Agricultural College, Middelburg, 5900 Cape Province, Republic of South Africa


This section reviews a number of probability distributions that are of particular value in animal product import risk assessments. They are also equally applicable to plant and plant product import problems as well as food safety issues. This section starts by looking at two random processes: Binomial and Poisson, from which the binomial and Poisson distributions and others are derived. The probability distributions that describe these processes form the majority of the technically based distributions that are needed in animal import risk analysis. After reviewing these two processes, we will look at the normal and hyper- geometric distributions that are also very useful in particular circumstances. This tutorial is, by necessity, a rather brief introduction but more detailed descriptions and examples can be found in Vose (1996).


Poisson and Binomial processes

Two stochastic processes, binomial and Poisson, form a large part of the structure of nearly all animal product import risk analyses. The binomial process describes a system where there are a definable number of trials (n), a probability of success of that trial (p) and a consequent number of successful trials (s). The assumption of a Binomial process is that all trials are independent, i.e. that each trial has the same probability of "successes" as the trial before it no matter what the outcome of previous trials have been. Examples of binomial processes are: tossing a coin ten times and seeing how many 'heads' come up; testing a group of infected animals and seeing how many show up positive using the test (assuming a test sensitivity of neither zero nor one); or randomly selecting a certain number of chicken wings from the supermarkets of England and seeing how many have salmonella.

The Poisson process describes a system where there is a continuum of opportunity of an event occurring (as opposed to the n distinct trials for the binomial process). The Poisson process has one descriptive parameter, r3, which is the mean number of occurrences of the event per unit of exposure. The β from a Poisson process is analogous to the probability p from the binomial process and the period of exposure t of the Poisson process is analogous to the number of trials n of the binomial process. Examples of Poisson processes are: how many giardia cysts a city's population consumes from its water supply in a year; how many fish one catches in a day's fly-fishing; and the number of times a person is mugged in one year on the streets of a particular city. However, these would only remain Poisson processes if: the city didn't improve its water treatment after any occurrence of giardia, the fisherman got no better (or worse) at catching the fish during the day, and the mugged person kept following his or her same habits no matter how many times he or she was mugged.


The distributions of the binomial process

The Binomial process is characterised by the probability p of an event occurring at each trial. Once we have estimated P, it is a simple process to calculate other variables associated with the binomial process.


The distribution of the number of events that occur in n trials = Binomial (n, p).

The number of trials needed for the event to occur for the first time = 1 + Geometric (p).

The number of trials needed for s events to have occurred = s + Negative Binomial (s, p).


Estimating probability p from an observed number of events in a specific number of trials Suppose we wish to determine the probability of occurrence of a specific event. We have observed that it has actually happened r times out of a possible n. Its true probability of occurrence (p) is modelled using the Beta distribution as:


p = Beta (α1, α2)





This use of the Beta distribution is, in fact, an application of Bayes' Theorem where no prior knowledge of p is assumed, i.e. the prior distribution of p is Uniform(0,1) - meaning p is equally likely to be any value between 0 and 1.



100 (n) birds were randomly selected from a very large flock of turkeys, 17 (r) were determined to be infected with salmonella. Let us assume for now that the sensitivity and specificity of the test are 100%. The true prevalence of salmonella within the flock (P) can be estimated as:


p = Beta(17+1, 100-17+1) = Beta (18,84)


Figure 2a illustrates how, though in theory p ranges between 0 and 1, in practical terms it is tightly contained between 0.08 and 0.3 0 with a strong peak at 0.17. The more tests that are performed, the narrower the distribution would become, i.e. the more accurately p will be determined. Figure 2b shows

how the distribution of p would narrow with increasing number of trials (using the same percentage of occurrences). Understanding this behaviour can be very useful, for example, in planning future tests: the predictable reduction of uncertainty can be balanced against the extra cost and time required to complete any additional tests.


The meaning of a distribution of probability

This use of the Beta distribution introduces the concept of a probability distribution of a probability – not something that is perhaps immediately intuitive and deserves further explanation. Consider the following example:

I am going to toss a coin. What is the probability of it showing 'heads'? If one assumes a fair coin, the probability should be exactly 50%. Before you answer, I should also tell you that the coin has the same face on both sides. Now, since I have not told you whether both sides are 'heads' or 'tails', you might quite reasonably assign equal probability to each. You could still say that the probability of a 'heads' is 50%. You might alternatively state that the probability has equal chances of being 0% or 100%, which is the distribution shown in Figure 3 - a probability distribution of a probability. The mean (average) of the distribution is equal to 50% - the first figure quoted, though the true probability in this case could never actually be that value. A probability may be described by a distribution, rather than a single value, only where there is a lack of knowledge of what that true probability actually is. The uncertainty described by a distribution of a probability never comes from the variability in the stochastic process itself. A failure to appreciate the differences between uncertainty from lack of knowledge and that from randomness has many implications in risk analysis modelling.

There are thus two forms of uncertainty in a risk analysis: the inherent uncertainty of the stochastic process being modelled, often described as variability; and the lack of exact knowledge we have of that problem, often described as uncertainty. So, for example, in the tossing of a fair coin we have precise knowledge of the system (the probability of 'heads' is exactly 50%) - there is no uncertainty, but there still remains the inherent variability of what the outcome of a toss might be.


Estimating probability p when there are no occurrences in a specific number of trials

Imagine, in the above experiment, that of the 100 tested turkeys, none tested positive, i.e. r = 0. We could not categorically say that there was no chance of a turkey being infected with salmonella in the flock: in fact if p was less than or equal to 1/101 we would be more likely to observe zero positives than any other number. However, we can still define a distribution of the probability of an infected turkey p in the same manner as before using a Beta distribution. This obviously produces a pessimistic estimate since it assumes that the possibility of infection does exist and that, before the experiments, our prior opinion said that the prevalence could equally likely be anywhere between zero and one. The Beta distribution can be used again with a1 = r + 1 = 1 and a2= n - r + 1 = n + 1.


Figure 4a shows how the distribution progressively favours a probability near zero with increasing number of tests n.


Exactly the same principle applies if we have seen a positive result in every test, i.e. r= n: a1, = n + 1 and a2 = 1. Figure 4b shows how this distribution is a mirror image of Figure 4a, progressively favouring a probability near 1 with increasing number of all positive tests n.


Estimating the probability of the occurrence of several events in a set of trials

The Binomial (n, p) distribution calculates the number of events that will occur in n trials where there is a probability p of success in each trial.



An agriculture ministry knows that there is a 2% prevalence p of Johne's disease within a country that is applying for an import licence for its cattle. An entrepreneur is intending to import 1200 cattle n from this country. How many will be infected s with Johne's disease? Answer: Binomial(1200,2%). Figure 5a shows the cumulative distribution function F(x) of this probability distribution: a very common representation of a probability distribution. It can be seen that there is about a 3% chance that s will be less than 15 and a 97% chance that s will be less than 33. These two values represent roughly the lower and upper 95% (roughly 97% - 3%) confidence limits respectively. Figure 5b shows the relative probability distribution function f(x) for the same distribution. This type of plot allows one to see the relative likelihood of each allowable value. It is useful for offering a 'feel' of the uncertainty, though it is very limited in providing quantitative information.


Estimating the probability of at least one event in a set of trials

The probability of no events in a set of n trials is (1-p)n i.e. the probability that trial 1 fails (1-p) * the probability trial 2 fails (1-p) *...* the probability the nth trial fails (1-p). The probability P1 of at least one event in a set of n trials is therefore 1-(1-p)n . Where n is large (>30 odd) and p is sufficiently small such that np < 1, the approximation P1 ≈np can be used. However, the accuracy of this approximation is entirely dependent on the values of n and p and it is better practice to use the full 1-( 1-p)n formula.



Specially processed steaks imported from a particular country are estimated to have a 1:106 chance of being infected with Foot and Mouth Disease (FMD) and getting past all screening tests. A supermarket chain wishes to import 2000 of these steaks a year. What is the probability that FMD will enter the country with those steaks, if this import is allowed?


P1 = 1 - (1 -10-6)2000 = 1.9998*10-3


The P1 ≈np approximation would give a value of 2.0000*10-3.


Estimating the number of trials until a specific number of events occur.

The Geometric(p) distribution estimates the number of unsuccessful trials one will have to complete before the first success occurs. In other words, it is a distribution of the number of unsuccessful trials before the first success. Thus the number of trials required for the first occurrence of an event equals (1 + Geometric (p)).

The Negative_Binomial(s,p) distribution estimates the number of unsuccessful trials one will have to complete before s successes occur. Thus, in the same manner as the Geometric distribution, the total number of trials required for s occurrences of an event equals (s + Negative_Binomial(s,p)).



A vet knows that, on average, 1 in every 11 pigs she tests will be infected with a particular disease. How many pigs will she have to test before she tests an infected pig and how many before she would have tested 25 infected pigs.


The prevalence of this disease p = 1/11 = 0.091. The number of pigs N1 she must test before she will have tested an infected pig, can be estimated as:


N1 = 1 + Geometric(O.091)


The number of pigs she will have to test N25 before she tests 25 infected pigs can be estimated as:


N25 = 25 +Negative Binomial(25,0.091)


However, we often do not know that the last trial actually was a success. For example, imagine we had tested a flock of 1 000 chickens for a particular disease, and identified 24 infected birds. Let us further imagine the test sensitivity is 75%. We will probably have failed to identify quite a number of infected birds: our best estimate might be that we had missed eight. However, we can use the Negative Binomial distribution to give us a better estimate. One might be tempted to model the number of infected birds N that were not detected as:


N=Negative Binomial(24,75%)


However, this would be assuming that the last infected bird was detected (the last trial was a success), whereas clearly the last few infected chickens tested could easily not have been detected. It turns out that, in situations like this, one can use the formula:


Number of failures = Negative-Binomialn(s+1,p)


So, in the example above: N = Negative-Binomial(25,75%)


Distributions of the Poisson process

The Poisson process is characterised by the mean interval between events (MIBE) β. Once we have estimated the MIBE, it is a simple process to calculate other probability measures. The assumption of the Poisson process is that the probability of an event occurring per unit interval (e.g. per hour, per metre, per kg) is constant and independent of however many events have occurred before or how recently.

Once the MIBE is determined, other variables can easily be found:


The distribution of the number of events s that occur in interval t = Poisson(t/β).

The time until the next event t1 = Exponential(β).

The time until s events have occurred ts = Gamma(n,β).


where t and β are measured in the same units (e.g. days, kg, tonnes).


Determining MIBE β from an observed number of events over a continuous interval.

The MIBE is the average interval between n observed occurrences of an event. Its true value can be estimated from the observed occurrences using Central Limit Theorem:


where is the average of the n-1 observed intervals ti between the n observed contiguous events and δ. is the standard deviation of the ti intervals (δ should be almost the same value as for a Poisson process). The larger the value of n, the narrower will be the distribution of β, i.e. the greater our confidence in knowing its true value. Care should be taken when n is small « about 10) because the distribution will have a tail with significant probability of being negative and will therefore have to be truncated.

Sometimes, we do not know the values of the intervals ti, but only the number of events n that occurred in a total interval T. A conservative (i.e. pessimistic if the event is not desired) estimate of the MIBE β is: β = T/(n+I).


Estimating a minimum β where there are no observed events over a continuous interval

We can use the Exponential distribution to estimate at least a lower bound for the MIBE, given that we have observed no occurrences of the event:

β = 1/ Exponential(1/X}

Since the lower the MIBE, the more frequently the event occurs, a lower bound for the MIBE is equivalent to providing an estimate of the highest possible frequency of the event. This provides us with a minimum estimate of ~ since it assumes that: 1) the event is possible; and 2) it will occur for the first time immediately after the last time of observation.



In the sixteen years of monitoring turkeys for a particular disease there has never been an observation of that disease. What is its minimum MIBE?

Lower bound for the MIBE is calculated as:

Minimum MIBE (βmin= 1/ Exponential(I/16) years


Probability of the occurrence of several events in an interval

The Poisson(t/β) distribution calculates the distribution of the number of events that will occur in an interval t.



Outbreaks of disease Z appear to occur in wild ponies in a certain area. Data for the last 36 years show 5 outbreaks. A conservative (upper bound) estimate is needed of how many outbreaks could occur in the next 10 years.

β = 36/(5+1) = 6 years from before. A Poisson (t /β) distribution estimates the number of occurrences in an interval t. Then, the number of outbreaks N in the next ten years is modelled by:

N = Poisson (t/β) = Poisson(IO/6) = Poisson(I.66666)


Probability of at least one event in an interval

The probability that no event will occur in an interval of length x is exp(-x/β). The probability of at least one event in a single unit interval is therefore 1 - exp(-x/β).



A government knows that a cattle disease breaks out on average once every 3.6 years. It has a general election coming up in 6 months and has drastically cut the disease eradication part of its agricultural regulatory budget. What is the probability of getting through the next election before another outbreak of the disease?

The country would, on average, expect to experience an outbreak every 3.6 years. Thus:

β = 3.6 years

The probability Pok of no outbreaks in the next six months is then:

Pok = exp(-0.5 I 3.6) = 87%


Other distributions in common use in animal health risk analysis

Hypergeometric distribution

Consider a herd of M cows that is known to include D cows that are infected with a particular virus. If we select n cows from this herd, the hypergeometric(n, D,M) distribution returns the number of cows in that group of n that could be infected. The hypergeometric distribution models a type of sampling without replacement. As we select each of the n cows from the group of M, the probability that the next cow is infected changes. (If we were to put each selected cow back into the herd and then take the next cow out, the probability of an individual cow being infected would remain the same (i.e. DIM) and we could have used a binomial distribution to model the number of infected cows in the sample). In general, if M>20n, the binomial distribution is a good approximation to the hypergeometric distribution.

So, for example, imagine we had a herd of 20 cows of which we know three are infected and we intend to select four cows from the herd at random. The probability that the first cow is infected is 3/20. The probability that the second cow is infected is either 3/19 if the first cow selected was not infected or 2/19 if it was - the probability does not remain the same for each selected cow.


Normal distribution

The normal(μ,δ) distribution is often used in animal health risk analysis, either as a consequence of applying Central Limit Theorem or because the variable is known to be roughly normally distributed. The latter is commonly the case for natural measurements like weight of an adult of a particular species.



We will now look at a hypothetical model to see how some of the distributions described above can be put together to produce a useful risk analysis model.


The Problem

An entrepreneur wishes to import packets of 1 00 turkey testicles from free range farms of a particular country. Slaughterhouse records on 300 turkey farms showed that during the past year disease X was found on 34 farms. A serological survey of 100 turkeys on 7 of these positive farms revealed the following number of positives: 4, 6, 2, 5, 8, 3, 1. The tests were performed using a procedure with an 85% sensitivity and almost 100% specificity.

Turkeys are sent to the specialist slaughterhouse in batches of 50 from each free range farm. Two birds are tested for disease X from any batch at the farm prior to transportation to the slaughterhouse. The serological test is the same as the one used for the above survey. A turkey infected with disease X will have no external symptoms, though there is a 10% to 40%, most probably a 30% chance of discoloration of the muscle tissue, which would certainly be spotted by the meat inspectors at the slaughterhouse.

The packaged testicles will be exported chilled. It is estimated that there is a probability of somewhere between 20% and 50%, most likely 40%, that the pathogenic organism will survive the chilling, if present. The authorities for the importing country have been asked to grant a licence to an entrepreneur to import packets of these turkey testicles. It is understood that identifying infection at any stage will result in the rejection of just the package(s) of affected testicles.

The licensing authority wishes to determine the probability that this licence will introduce disease X into its country. What is the distribution of the number of pairs of infected testicles in anyone packet of 1 00 testicles that passes all inspection? What is the probability that an accepted packet has at least one infected testicle at import?


The model

This problem has been modelled in two ways for comparative purposes. The first method is a simple simulation model. It is very easy to construct and will provide the mean of the distributions of the probabilities in question. However, it does not easily lend itself to constructing the distributions of the uncertainties of these probabilities that arise from lack of precise knowledge about any input probabilities. An excellent example in the animal product import area of this method of modelling has been developed by the Ministry of Agriculture, New Zealand (Van der Logt et al. (1 997)). The paper follows a very similar presentation as shown for Model 1 below.

The second method calculates the probabilities directly. It is mathematically more complex than the first model and the method is less flexible. However, one can arrive at distributions of the probabilities one is being asked to determine. A very good model has been developed using this approach (Cassin et al., (1997)) to assess the risk of E. coli in beef burgers by Agriculture Canada. The model is in the final stages of development but will be published soon and would be a very instructive read.

Both models use the Excel spreadsheet application (Microsoft Corporation, Seattle, Washington) as the modelling environment and the @RISK risk analysis add-in (Palisade Corporation, Newfield, New York) to give Excel the ability to generate Monte Carlo sampling from probability distributions. The extra functions in Excel provided by @RISK are characterised by starting with the letters 'Risk', e.g. RiskBeta. The reader should use the following descriptions of the models in conjunction with spreadsheet printouts Figures 6 and 7 and formulae tables: Tables 1 and 2. Where one formula is shown for a range of cells in these tables, the formula has been given for the first cell I the range. Formulae for the other cells in the range would be obtained by copying this first formula into all the other cells using the Copy-Paste or Autofill spreadsheet features.


Model 1

Flock prevalence calculation

Three hundred flocks have been tested and 34 were found to be infected. The distribution of the true flock prevalence Pf can therefore be estimated as RiskBeta(34+1 ,300-34+1) = RiskBeta(35,267). The assumptions here are: that there are many more than 300 flocks; that the 300 selected flocks can be considered a random sample; and that prior to this testing there was no knowledge of the level of flock prevalence (a prior distribution of Uniform(0,1) as discussed above.


Within flock prevalence

One hundred turkeys were tested from each of the seven infected flocks for which we have the number si of turkeys that were identified as being infected. The test sensitivity is 85% so it is quite possible that a few infected tested birds m, were not identified. The number of birds that were missed m, can be estimated using a negative binomial distribution as:

mi = RiskNegBin(si+1,85%)

The assumption behind this formula is that each infected bird has equal probability of being, detected (i.e. a binomial process) and that (mi +si) is a lot less than the total number of birds tested (100 in this case).

The true prevalence pi of flock i can then be estimated as:

pi = RiskBeta (mi +si + 1, 1 00-(mi +si)+ 1)

in a similar fashion to the flock prevalence above.

We now have seven pi s. We would like to combine these seven within-flock prevalences to produce a distribution of within flock prevalence Pa for all other infected flocks. One method of combining these seven Pi distributions is to use the RiskDuniform({y}) distribution: a discrete distribution where all values within its parameter array {y} have equal probability. This method of combining distributions is also very useful for combining dissimilar expert opinions (Vose (1996), p 180).


Estimate of number of infected turkeys in a consignment

A consignment is considered to be one packet of 1 00 testicles. Cell E21 toggles between 0 (source flock is not infected) and 1 (source flock is infected), the probability of generating a value of 1 being the flock prevalence (Cell E5).

The number Ni of infected turkeys contributing to a consignment is modelled in Cell E22 as RiskBinomial(50,Pa)*E21. Again, the assumption here is that the flock size is many more than 50 birds (and that each turkey has two testicles).


Pre slaughter testing

Two turkeys are to be tested out of the 50 that make up a consignment. The number of these tested turkeys Nti that are infected can be modelled using a hypergeometric distribution as:

Nti = RiskHypergeo(2,Ni,50)

An IF statement is wrapped around this distribution to ensure that it returns a zero when Ni is zero, rather than an error.

The number Np of the Nti birds that test positive with the 85% test sensitivity is modelled in Cell F27 as RiskBinomial (Nti,85%). Again, an IF statement is wrapped around this distribution to ensure it returns a zero if Nti is zero, rather than an error.


Inspection at slaughterhouse

The probability Pd of an infected bird having discoloured meat and therefore being spotted by the meat inspector is modelled as RiskPert(10%,30%,40%). The Pert distribution is similar to the Triang(ular) distribution frequently used in these types of models, but has the advantages over the Triang distribution of being more naturally shaped and of being less sensitive to the estimation of the minimum and maximum values (Vose (1 996) pp 166-173). The number Nin of infected birds in the consignment that would be detected at the slaughterhouse is therefore:

Nin = RiskBinomial(Ni,Pd)

An IF statement is wrapped around this distribution to ensure that the cell returns a zero in the event that Ni is zero, where it would obviously be inappropriate to attempt to generate a value from the binomial distribution.

The distribution of the number of infected testicles Nit in a consignment that gets through these tests is then calculated in Cell G35 using the equation:

IF(Nin+Np=O, Ni*2,0)


Pathogen surviving chilling

The probability of the pathogen surviving chilling is assumed to apply to the whole packet of 1 00 testicles. It is assumed that if the pathogen survives in one infected testicle in a packet it will survive in all the other infected testicles in that packet. However, the infection will not spread through the other uninfected testicles. The probability Ps of survival is modelled in the same way as Pd.

Ps = RiskPert(20%,40%,50%)

The number Nii of infected testicle pairs being imported in one packet of 100 testicles from this source is then modelled in Cell F35 as:

Nii = RiskDiscrete({0,Nif},{1 - Ps, Ps})

This model was run for 100000 Latin Hypercube iterations (Vose 1996 pp 41-46). A histogram of the resultant distribution for Nii is shown in Figure 8.


Model 2

Models 1 and 2 together with the results of 3 000 iterations of Model 2. They arrive at the same result but Model 2 will more reliably and more quickly reach the theoretical answer than Model 1.



It has been shown that, with a good understanding of a few basic distributions, a risk analysis model can be constructed that is transparent and provides both measures of the probabilities of outcomes and the degree of uncertainty one may have about these probabilities. Performed correctly, quantitative risk analysis is a powerful tool that will guide the decision-maker towards a better understanding of the risks being faced, the effectiveness of current and planned risk management strategies and of the value of further research to reduce any uncertainty in the model.



CASSIN, M.H., LAMMERDING, A.M., TODD, C.D., ROSS, W., McCOLL, R.S. (1997) - Quantitative Risk Assessment for Escherichia coli 0157:H7 in Ground Beef Hamburgers, pers. Comm.

VAN DER LOGT P.B., HATHAWAY S.C. & VOSE D.J. (1997) - Risk Assessment Model for Human Infection with the Cestode, Taenia#sagtntrtti. J Food Protec., in press.

VOSE D.J. (1996) - Quantitative Risk Analysis: a Guide to Monte Carlo Simulation Modelling. John Wiley & Sons, ltd., Chichester, United Kingdom.317 pp.




E5                               =RISKBETA(E4+1,E3-E4+1)

E10:E16                       =RISKNEGBIN(D10+1,$D$8)

F10:F16                       =E10+D10

G10;G16                      =RISKBETA(F10+1,C10-F10+1)

E17                             =RISKOUNIFORIN(G10:G16)

H22                             =E5

H21                             =1-H22

E21                             =RISKOISCRETE(G21 :G22:H21 :H22)

E22                             =RISKBINOMIAL(E20/2,E17)*E21

F26                             =IF(E22=0,0,RISKHYPERGEO(F25,E22E20/2))

F27                             =IF(F26=0,0,RISKBINOMIAL926,D28))

F30                             =RISKPERT(10%,30%,40%)

F31                             =IF(E22=0,0,RISKBINOMIAL(E22,F30))

F34                             =Riskpert(20%,40%,50%)

H35                             =F34

H34                             =1-H35

G35                             =IF(F1+F27=0,E22,0)

F35                             =RISKOISCRETE(G34:G35,H34:H35)




E20                             =D20/2

C28:C27                      =BINOMOIST(B28,$E$20,$D$17,FALSE)

D28                             =HYPGEOMOIST(0,$E%21,B28,$E$20)+HYPGEOMOIST(1,$E$21 ,B28,$E$200*( 1-$C8)

D29:D75                      =HYPGEOMDIST(0,$E$21,B29,$E$20)+HYPGEOMDIST(1 ,$E$21 ,B29,$E$20)*(1-$C$8)+HYPGEOMOIST(2,$E$21,B29,$E$20)((1-$C$8)^2)

D76                             =HYPGEOMOIST(1,$E$21,B76,$E$20,1-$C$8)+(HYPGEOM0IST(2,$E$21 , B76,$E$20)*( (1-$C$80A2)

077                             =HYPGEOMOIST(2,$E$21 ,B77 ,$E20)*((1-$C$8)A2)

E28:E77                       =(1-$E$220^B28

F28                             =E23

G28:G77                     =$D$5*C28*D28*E28*$F$28

C78,G78                     =SUM(C28:C77)

C80                             =G78

E80                             =G78*( 1-F28)/F28

G80                             =-1-(C78*D5)

H82                             =C80/(C80+E80+G80)





Veterinary Congress 1995