The Probability Distributions

This post introduces a few of the most commonly encountered families of probability distributions: the Poisson, which models rates, and the geometric distribution, which models waiting times for discrete stochastic processes and the exponential, which is a continuous analogue of the geometric relating closely to the Poisson family.

This is a prototype post on probability distributions that I hope to expand in future. It aims to cover the relationships between the distributions most commonly encountered in undergraduate statistics courses and that are very commonly employed for social and life science applications. For now, it contains a discussion of the Poisson, geometric and exponential families of distributions. The content and layout of this post may change as it is updated.

The Poisson Family

The following discusses the Poisson distribution, which models rates of independent variables in time and space.

The Poisson is the Binomial as $n$ tends to infinity

The Poisson distribution is the binomial when $n$ tends to infinity, $n \rightarrow \infty$, for $np$ fixed, $np = \lambda$ (i.e. also as $p \rightarrow 0$).

Define $\lambda = np$

We have $Y \sim Binom(n, p)$ when:

\[p(Y = y) = \binom{n}{k} \cdot p^k \cdot (1-p)^{n-k}\]

and $Y \sim Pois(n, p)$ when:

\[p(Y = y) = \frac{e^{-\lambda} \cdot \lambda^y}{y!}\]

This has the straightforward interpretation of being the case when the number of trials is large e.g. radioisotope decay can be modelled as Poisson distributed because even a small quantity of a metal contains a very large number of atoms.

From $\lambda = np$ we have that $p = \dfrac{\lambda}{n}$. Substitute $\dfrac{\lambda}{n}$ into the binomial density function and let $n \rightarrow \infty$:

\[\begin{aligned} & \lim_{n \rightarrow \infty} \binom{n}{k} \cdot \left( \dfrac{\lambda}{n} \right) ^k \cdot \left(1-\dfrac{\lambda}{n} \right)^{n-k} && \ \text{expand combinatorial term} \\ \\ &= \lim_{n \rightarrow \infty} \dfrac{n!}{k!(n-k)!} \cdot \left( \dfrac{\lambda}{n} \right) ^k \cdot \left(1-\dfrac{\lambda}{n} \right)^{n-k} && \ \text{remove constant terms} \\ &= \dfrac{1}{k!} \cdot \lambda^k \ \lim_{n \rightarrow \infty} \dfrac{n!}{(n-k)!} \cdot \dfrac{1}{n^k} \cdot \left(1-\dfrac{\lambda}{n} \right)^{n-k} && \ \text{separate index of } 1 - \dfrac{\lambda}{n} \\ &= \dfrac{1}{k!} \cdot \lambda^k \ \lim_{n \rightarrow \infty} \dfrac{n!}{(n-k)!} \cdot \dfrac{1}{n^k} \cdot \left(1-\dfrac{\lambda}{n} \right)^{n} \cdot \left(1-\dfrac{\lambda}{n} \right)^{-k} && \ \text{} \\ \end{aligned}\]

We can consider this expression within the limit in three component parts:

\[\dfrac{1}{k!} \cdot \lambda^k \ \lim_{n \rightarrow \infty} \underbrace{\dfrac{n!}{(n-k)!} \cdot \dfrac{1}{n^k}}_{1} \cdot \underbrace{\left(1-\dfrac{\lambda}{n} \right)^{n}}_{2} \cdot \underbrace{\left(1-\dfrac{\lambda}{n} \right)^{-k}}_{3}\]

Looking at the first component, we can expand the factorials in the numerator and denominator whilst simultaneously keeping in mind that there are $k$ divisions by $n$:

\[\begin{aligned} &\lim_{n \rightarrow \infty} \dfrac{n!}{(n-k)!} \cdot \dfrac{1}{n^k} \\ &= \lim_{n \rightarrow \infty} \dfrac{n \cdot (n-1) \cdot \dots \cdot 2 \cdot 1}{(n-k) \cdot (n-k-1) \cdot \dots \cdot 2 \cdot 1} \cdot \dfrac{1}{n^k} && \ k \text{ terms cancel} \\ &= \lim_{n \rightarrow \infty} \dfrac{n \cdot (n-1) \cdot \dots \cdot (n - (k-2)) \cdot (n - (k-1))}{n^k} && \\ &= \lim_{n \rightarrow \infty} \dfrac{n}{n} \cdot \dfrac{(n-1)}{n} \cdot \dots \cdot \dfrac{(n - (k-2))}{n} \cdot \dfrac{(n - (k-1))}{n} && \text{n is in the top and bottom of each} \\ &= 1 \cdot 1 \cdot \dots \cdot 1 \cdot 1 \\ &= 1 \end{aligned}\]

Now turning to the second term:

\[\lim_{n \rightarrow \infty} \left(1-\dfrac{\lambda}{n} \right)^{n}\]

Recall the definition of $e$ is:

\[e = \lim_{x \rightarrow \infty} \left(1 + \frac{1}{x}\right)^x\]

Set $x = -\dfrac{n}{\lambda}$, which is also equal to $-\frac{1}{p}$. Substitute this into the definition of $e$:

\[\begin{aligned} &\lim_{n \rightarrow \infty} \left(1 - \frac{\lambda}{n}\right)^n && \text{reexpress } - \frac{\lambda}{n} \\ &= \lim_{n \rightarrow \infty} \left(1 + \frac{1}{x}\right)^n && \text{reexpress index}\ n \\ &= \lim_{n \rightarrow \infty} \left(1 + \frac{1}{x}\right)^{-x \cdot \lambda} = \lim_{n \rightarrow \infty} \left(1 + \frac{1}{x}\right)^{x \cdot (-\lambda)} && e \ \text{reemerges} \\ &= e^{-\lambda} \\ \end{aligned}\]

Note that this is legitimate as our construction, $x = -\frac{n}{\lambda}$, means that $x$ approaches infinity as $n$ does.

Finally, the third term

\[\lim_{n\rightarrow \infty} \left(1-\dfrac{\lambda}{n} \right)^{-k}\]

As $n$ becomes large, $\dfrac{\lambda}{n}$ approaches $0$ and the argument of the limit becomes $1$:

\[\begin{aligned} &\lim_{n\rightarrow \infty} \left(1-\dfrac{\lambda}{n} \right)^{-k} \\ &= 1^{-k} \\ &= \dfrac{1}{1^k} \\ &= 1 \end{aligned}\]

So reconstituting the three original terms:

Having taken the limits for large $n$ we now have:

\[\begin{aligned} &\dfrac{1}{k!} \cdot \lambda^k \ \underbrace{(1)}_{1} \cdot \underbrace{e^{-\lambda}}_{2} \cdot \underbrace{(1)}_{3} \\ \\ &= \dfrac{e^{-\lambda} \cdot \lambda^k}{k!} \end{aligned}\]

We have arrived at the familiar formula for the Poisson density from earlier.

The Geometric Distribution

The Geometric Distribution describes the number of trials you wait for a success

The geometric distribution (the number of Bernoulli trials you wait for a success) is a discrete precursor to the exponential. Its variance is derived using the derivative of the sum of a geometric series.

For binomially distributed random variables, the number of trials until the first success is distributed:

\[P(Y=y) = p^{y-1} \cdot p \ \text{for } y=1,2, \dots\]

As such we have $Y \sim Geom(p)$ with $p$ the per-trial probability of a success. We have $\mathbb{E}(Y) = \dfrac{1}{p}$, which is intuitive, and $\mathbb{V}ar(Y) = \dfrac{1-p}{p^2}$, which is derived via the first derivative of the sum of a Geometric Series (see next section).

The geometric distribution is memoryless, in the sense that

\[P(Y > n + k \ | \ Y > k) = P(Y > n)\]

This can be interpreted as saying that irrespective of the past and past failures, you always have as long to wait in expectation as if you had just started. It is a simple consequence of the independence of the Bernoulli trials.

Derivation of the Variance of a Geometrically Distributed Random Variable

Prerequisites

As prerequisites, we have that the sum of a geometric series is:

\[g(r)=\sum\limits_{k=0}^\infty ar^k=a+ar+ar^2+ar^3+\cdots=\dfrac{a}{1-r}=a(1-r)^{-1}\]

and consequently the first derivative with respect to $r$ is:

\[g'(r)=\sum\limits_{k=1}^\infty akr^{k-1}=0+a+2ar+3ar^2+\cdots=\dfrac{a}{(1-r)^2}=a(1-r)^{-2}\]

Derivation

From $\mathbb{V}ar(Y) = \mathbb{E}(Y^2) - \mathbb{E}(Y)^2$ we first add $0 = \mathbb{E}(Y) - \mathbb{E}(Y)$

\[\begin{aligned} \mathbb{V}ar(Y) &= \mathbb{E}(Y^2) - \mathbb{E}(Y)^2 && \text{add } - \mathbb{E}(Y) + \mathbb{E}(Y) \\ \\ \mathbb{V}ar(Y) &= \mathbb{E}(Y^2) - \mathbb{E}(Y) + \mathbb{E}(Y) - \mathbb{E}(Y)^2 && \text{Expectation linear operator} \\ \\ \mathbb{V}ar(Y) &= \mathbb{E}(Y(Y-1)) + \mathbb{E}(Y) - \mathbb{E}(Y) \end{aligned}\]

Focusing just on the first term, $\mathbb{E}(Y(Y-1))$, we have

\[\begin{aligned} &\mathbb{E}(Y(Y-1)) \\ \\ &= \sum_{y=1}^{\infty} p \cdot (1-p)^{y-1} \cdot y \cdot (y-1) && \text{pull out } (1-p) \\ \\ \end{aligned}\]

This resembles the right hand side of the first derivative sum, $s = \dfrac{2a}{(1-r)^3}$, with $a=p\cdot(1-p)$ and $r=1-p$, which gives as an alternative expression for this first term

\[\dfrac{2p(1-p)}{(1-(1-p))^3} = \dfrac{2p(1-p)}{p^3} = \dfrac{2(1-p)}{p^2}\]

Putting this back as the first term into the expression for the variance gives

\[\begin{aligned} \sigma^2 &= \mathbb{E}(Y(Y-1)) + \mathbb{E}(Y) - \mathbb{E}(Y)^2 \\ \\ &= \dfrac{2(1-p)}{p^2} + \dfrac{1}{p} - \dfrac{1}{p^2} \\ \\ &= \dfrac{2(1-p) + p - 1}{p^2} \\ \\ &= \dfrac{1-p}{p^2} \ \blacksquare \end{aligned}\]

This yields the expression for the variance of the geometric distribution, $\sigma^2 = \frac{1-p}{p^2}$.

The Exponential Distribution

The following discusses the Exponential Distribution, which models the waiting times between Poisson-distributed events and is a continuous analogue of the geometric.

The Exponential Distribution from the Poisson

The exponential distribution describes the probability distribution of time, $t$, waited for the first occurrence of a Poisson-distributed event, $Y \sim Pois(\lambda)$.

(The fact that $t$ models the time until the first event makes the exponential a special case of the Gamma distribution with $k=1$. In other words, the exponential family of distributions is a proper subset of the gamma family of distributions.)

Derivation given a Poisson process

In the waiting period, no events occur making $P(Y = 0)$ a definition of the probability of waiting time being equal to $1$ time unit and since the Poisson assumes independently distributed events, we can multiply this $t$ times to yield a probability distribution of waiting times over time.

Given $Y \sim Pois(\lambda)$

\[P(Y = y) = \dfrac{e^{-\lambda} \cdot \lambda^y}{y!}\]

For $P(Y = 0)$ we have

\[P(Y = 0) = \dfrac{e^{-\lambda} \cdot \lambda^0}{0!} = e^{-\lambda}\]

The probability of $t$ of these waiting times occurring consecutively is $t$ such terms¹

\[(e^{-\lambda})^t = e^{-\lambda \cdot t}\]

We interpret this as the probability of the waiting time being at least duration $t$

\[P(T \geq t) = e^{-\lambda \cdot t}\]

Accordingly, the exponential distribution function (CDF) is given by the complement to one

\[P(T \leq t) = 1 - e^{-\lambda \cdot t}\]

Probability distribution function

We are required to differentiate this to give the probability distribution function (PDF).

Taking $1 - e^{-\lambda \cdot t}$ and setting $u = -\lambda \cdot t$

\[1 - e^u; \ u = -\lambda \cdot t\] \[\frac{d}{du} = -e^u; \ \frac{du}{dt} = -\lambda; \ \frac{d}{dt} = \lambda \cdot e^{-\lambda \cdot t}\]

This yields the expression for the probability distribution of the exponential from the Poisson

\[P(T = t) = \lambda \cdot e^{-\lambda \cdot t}\]

Though crucially, more than $t$ such waiting times could occur so this is a lower bound according to the underlying Poisson model. ↩