## Pages

Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

### The Binomial Distributon

The Binomial Distribution

• It is a discrete probability distribution.
• The distribution of a random variable X is discrete, if it can assume only a finite or countably infinite number of values.
• Considering u the set of all possible values of X: $$\sum_u Pr \left(X = u\right) = 1$$
• The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
• Each success/failure experiment is called a Bernoulli trial.
• The binomial distribution is the basis of the binomial test of statistical significance
• It is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. The replacement after each draws makes the draws independents.

If the probability of a successful trial is p, then the probability of having exactly k successes in n identical independent trials is given by the probability mass function below:
\begin{aligned} f\left(k; n,p \right) = Pr \left(X = k\right) = \binom{n}{k} p^k {\left( 1 - p \right)}^{n-k} \\ \text{for k = 0, 1, 2, ..., n, where} \\ \binom{n}{k} = \frac{n!}{k!(n-k)!} \end{aligned}

The formula can be understood as follows: we want k successes (with probability $p^k$) and n-1 failures (probability ${\left( 1 - n \right)}^{n-k}$). However, the k successes can occur anywhere among the n trials, and there are $\binom{n}{k}$ different ways of distributing k success in a sequence of n trials.

 Consider the following problem: One six-sided dice is rolled 15 times. What is the probability of rolling 5 or less 2's?
In each roll, the probability of rolling a particular number, say 2, is 1/6.
The probability of rolling 5 or less 2's is the sum of probabilities of rolling 0,1,2,3,4 and 5 2's.
\begin{aligned} Pr \left(X \leq 5\right) = \sum_{k=0}^5 Pr \left(X = k\right) \end{aligned}
Using R density or probability function dbinom() to obtain the probability:
• dbinom() returns the probability of an outcome of a binomial distribution
• The probability of rolling exactly 5 2's is
> dbinom(5, size=15, prob=0.167)
[1] 0.06274624

• The probability of rolling 0,1,2,3,4 or 5 2's:
> dbinom(0, size=15, prob=0.167) +
+ dbinom(1, size=15, prob=0.167) +
+ dbinom(2, size=15, prob=0.167) +
+ dbinom(3, size=15, prob=0.167) +
+ dbinom(4, size=15, prob=0.167) +
+ dbinom(5, size=15, prob=0.167)
[1] 0.9723556

• Alternatively, we can use the cumulative probability function for binomial distribution pbinom().
• $Pr\left(X \leq 5 \right)$
> pbinom(5,size=15, prob=0.167)
[1] 0.9723556


• As seen above, the pbinom() function is useful to summing consecutive binomial probabilities.
• Other questions that can be answered include:
• What is the probability of rolling 5 or more 2's? $Pr\left(X \geq 5 \right)$
• $Pr\left(X \geq 5 \right) = 1 - Pr\left(X \leq 4 \right) = 1 - \text{pbinom(4, size=15, prob=0.167) = 0.09039}$
• > 1 - pbinom(4, 15, 0.167)
[1] 0.09039063

• What is the probability of rolling more than 4 and less than 8 2's? $Pr\left(4 \leq X \leq 8 \right)$
• $Pr\left(4 \leq X \leq 8 \right) = Pr\left(X \leq 8 \right) - Pr\left(X \leq 5 \right) = \text{pbinom(8, size=15, prob=0.167) - pbinom(5, 15, 0.167) = 0.02720835}$
• > pbinom(8, 15, 0.16667) - pbinom(5,15, 0.16667)
[1] 0.02720835

• Plotting the probability distribution:
• df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167))
plot(df, type="b", xlab="Number (x) of rolls of 2's", ylab= "Pr(x)")
• Consider n=100 (number of observations), size=15 (number of trials), prob=0.167 (probability of success in each trial).
• bindat <- rbinom(100, 15, 0.167)
hist(bindat, breaks=seq(0,10,1), xlab="N successes")
• Plotting the area showing the cumulative probability: What is the probability of rolling "at least" 5 2's (5 or more)?
df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167))
require(ggplot2)
ggplot(data=df, aes(x=x,y=prob)) + geom_line() +
geom_ribbon(data=subset(df,x>=5 & x<=15),aes(ymax=prob),ymin=0,
fill="red", colour = NA, alpha = 0.5)


### Probability Distributions (I)

Probability distributions

• A probability distribution describes how the values of a random variable are distributed.
• It assigns a probability to each possible outcome of a process or experiment that is assumed random. The random variable can be continuous or discrete.
• Probability distributions can be very useful because, since the characteristics of each distribution are well understood, they can be used to, using a sample of observations, make statistical inferences on the entire population.
• A probability distribution can be specified in a number of ways:
• Through a probability density function (probability mass function)
• Through a cumulative distribution function (survival function)
• Through a hazard function
• Through a characteristic function
• Some common distributions include:
• Binomial distribution: dbinom()
• The collection of possible outcomes of a coin toss [H|T] follow a
• Cauchy distribution: dcauchy()
• Chi-squared distribution: dchisq()
• Exponential distribution: dexp()
• F distribution: df()
• Gamma distribution: dgamma()
• Hypergeometric distribution: dhyper()
• Log-normal distribution: dlnorm()
• Geometric distribution: dgeom()
• Multinomial distribution: dmultinom()
• Negative binomial distribution: dnbinom()
• Normal distribution: dnorm()
• Poisson distribution: dpois()
• Student's t distribution: dhyper()
• Uniform distribution: dunif()
• Weibull distribution: dweibull()