The Binomial Distributon


The Binomial Distribution

  • It is a discrete probability distribution.
  • The distribution of a random variable X is discrete, if it can assume only a finite or countably infinite number of values.
  • Considering u the set of all possible values of X: $$\sum_u Pr \left(X = u\right) = 1 $$
  • The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
  • Each success/failure experiment is called a Bernoulli trial.
  • The binomial distribution is the basis of the binomial test of statistical significance
  • It is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. The replacement after each draws makes the draws independents.

If the probability of a successful trial is p, then the probability of having exactly k successes in n identical independent trials is given by the probability mass function below:
\[\begin{aligned}
f\left(k; n,p \right) = Pr \left(X = k\right) = \binom{n}{k} p^k {\left( 1 - p \right)}^{n-k} \\
\text{for k = 0, 1, 2, ..., n, where} \\
\binom{n}{k} = \frac{n!}{k!(n-k)!}
\end{aligned} \]

The formula can be understood as follows: we want k successes (with probability $p^k$) and n-1 failures (probability ${\left( 1 - n \right)}^{n-k}$). However, the k successes can occur anywhere among the n trials, and there are $ \binom{n}{k}$ different ways of distributing k success in a sequence of n trials.

Consider the following problem:
One six-sided dice is rolled 15 times. What is the probability of rolling 5 or less 2's?
In each roll, the probability of rolling a particular number, say 2, is 1/6.
The probability of rolling 5 or less 2's is the sum of probabilities of rolling 0,1,2,3,4 and 5 2's.
\[\begin{aligned}
Pr \left(X \leq 5\right) = \sum_{k=0}^5 Pr \left(X = k\right)
\end{aligned} \]
Using R density or probability function dbinom() to obtain the probability:
  • dbinom() returns the probability of an outcome of a binomial distribution
  • The probability of rolling exactly 5 2's is
> dbinom(5, size=15, prob=0.167)
[1] 0.06274624
  • The probability of rolling 0,1,2,3,4 or 5 2's:
> dbinom(0, size=15, prob=0.167) +
+ dbinom(1, size=15, prob=0.167) +
+ dbinom(2, size=15, prob=0.167) +
+ dbinom(3, size=15, prob=0.167) +
+ dbinom(4, size=15, prob=0.167) +
+ dbinom(5, size=15, prob=0.167)
[1] 0.9723556
  • Alternatively, we can use the cumulative probability function for binomial distribution pbinom().
  • $Pr\left(X \leq 5 \right)$
> pbinom(5,size=15, prob=0.167)
[1] 0.9723556


  • As seen above, the pbinom() function is useful to summing consecutive binomial probabilities.
  • Other questions that can be answered include:
    • What is the probability of rolling 5 or more 2's? $Pr\left(X \geq 5 \right) $
      • $Pr\left(X \geq 5 \right) = 1 - Pr\left(X \leq 4 \right) = 1 - \text{pbinom(4, size=15, prob=0.167) = 0.09039}$
      • > 1 - pbinom(4, 15, 0.167)
        [1] 0.09039063
        
    • What is the probability of rolling more than 4 and less than 8 2's? $Pr\left(4 \leq X \leq 8 \right)$
      • $Pr\left(4 \leq X \leq 8 \right) = Pr\left(X \leq 8 \right) - Pr\left(X \leq 5 \right) = \text{pbinom(8, size=15, prob=0.167) - pbinom(5, 15, 0.167) = 0.02720835}$
      • > pbinom(8, 15, 0.16667) - pbinom(5,15, 0.16667)
        [1] 0.02720835
        
  • Plotting the probability distribution:
  • df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167))
    plot(df, type="b", xlab="Number (x) of rolls of 2's", ylab= "Pr(x)")
  • Consider n=100 (number of observations), size=15 (number of trials), prob=0.167 (probability of success in each trial).
  • bindat <- rbinom(100, 15, 0.167)
    hist(bindat, breaks=seq(0,10,1), xlab="N successes")
  • Plotting the area showing the cumulative probability: What is the probability of rolling "at least" 5 2's (5 or more)?
    df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167))
    require(ggplot2)
    ggplot(data=df, aes(x=x,y=prob)) + geom_line() +
      geom_ribbon(data=subset(df,x>=5 & x<=15),aes(ymax=prob),ymin=0,
                  fill="red", colour = NA, alpha = 0.5)