8 Standard continuous distributions

On completion of this chapter, you should be able to:

recognise the probability functions and underlying parameters of uniform, exponential, gamma, beta, and normal random variables.
know the basic properties of the above continuous distributions.
apply these distributions as appropriate to problem solving.
approximate the binomial distribution by a normal distribution.

8.1 Introduction

In this chapter, some popular continuous distributions are discussed. Properties such as definitions and applications are considered.

8.2 Continuous uniform distribution

The continuous uniform distribution has a constant PDF over a given range.

Definition 8.1 (Continuous uniform distribution) If a random variable $X$ with range space $[a, b]$ has the PDF \[\begin{equation} f_X(x; a, b) = \displaystyle\frac{1}{b - a}\quad\text{for $a\le x\le b$}, \tag{8.1} \end{equation}\] then $X$ has a continuous uniform distribution. We write $X\sim U(a, b)$ or $X\sim\text{Unif}(a, b)$.

This distribution is also called the rectangular distribution. The same notation is used to denote the discrete and continuous uniform distribution; the context should make it clear which is meant. A plot of the PDF for a continuous uniform distribution is shown in Fig. 8.1.

FIGURE 8.1: The PDF for a continuous uniform distribution $U(a,b)$.

Definition 8.2 (Continuous uniform distribution: distribution function) For a random variable $X$ with the continuous uniform distribution given in Eq. (8.1), the distribution function is \[ F_X(x; a, b) = \begin{cases} 0 & \text{for $x < a$};\\ \displaystyle \frac{x - a}{b - a} & \text{for $a \le x \le b$};\\ 1 & \text{for $x > b$}. \end{cases} \]

The following are the basic properties of the continuous uniform distribution.

Theorem 8.1 (Continuous uniform distribution properties) If $X\sim\text{Unif}(a,b)$ then

$\operatorname{E}[X] = (a + b)/2$.
$\operatorname{var}[X] = (b - a)^2/12$.
$M_X(t) = \{ \exp(bt) - \exp(at) \} / \{t(b - a)\}$.

Proof. These proofs are left as exercises.

The four R functions for working with the continuous uniform distribution have the form [dpqr]unif(min, max) (see App. D).

Example 8.1 (Continuous uniform) If $X$ is uniformly distributed on $[-2, 2]$, then $\Pr(|X| > \frac{1}{2})$ can be found. Note that \[ f_X(x; a = -2, b = 2) = \frac{1}{4} \quad\text{for $-2 < x < 2$.} \] Then: \[\begin{align*} \Pr\left(|X| > \frac{1}{2}\right) &= \Pr\left(X > (1/2)\right) + \Pr\left(X < -(1/2)\right)\\ &= \int^2_{1/2} f(x)\,dx + \int^{-1/2}_{-2} f(x)\,dx\\ &= 3/4. \end{align*}\] The probability could also be computed by finding the area of the appropriate rectangle. Alternatively, in R:

punif(-0.5, -2, 2) + 
  1 - punif(0.5, -2, 2)
#> [1] 0.75

8.3 Normal distribution

8.3.1 Definition and properties

The most well-known continuous distribution is probably the normal distribution (or Gaussian distribution), sometimes called the bell-shaped distribution. The normal distribution has many applications (especially in sampling; see Sect. 12), and many natural quantities (such as heights and weights of humans) follow normal distributions.

Definition 8.3 (Normal distribution) If a random variable $X$ has the PDF \[\begin{equation} f_X(x; \mu, \sigma^2) = \displaystyle \frac{1}{\sigma \sqrt{2\pi}} \exp\left\{ -\frac{1}{2}\left( \frac{x-\mu}{\sigma}\right)^2 \right\} \tag{8.2} \end{equation}\] for $-\infty<x<\infty$, then $X$ has a normal distribution. The two parameters are the mean $\mu$ such that $-\infty < \mu < \infty$; and the variance $\sigma^2$ such that $\sigma^2 > 0$. We write $X\sim N(\mu, \sigma^2)$.

Some authors—especially in non-theoretical work—use the notation $X\sim N(\mu,\sigma)$ so it is wise to check each article or book for the notation used. Some examples of normal distribution PDFs are shown in Fig. 8.2.

$Some examples of normal distributions. The solid lines correspond to $\sigma = 0.5$ and the dashed lines to $\sigma = 1$. For the left panel, $\mu = -3$; for the right panel, $\mu = 2$.$

FIGURE 8.2: Some examples of normal distributions. The solid lines correspond to $\sigma = 0.5$ and the dashed lines to $\sigma = 1$. For the left panel, $\mu = -3$; for the right panel, $\mu = 2$.

In drawing the graph of the normal PDF, note that

$f_X(x)$ is symmetrical about $\mu$: $f_X(\mu - x) = f_X(\mu + x)$.
$f_X(x) \to 0$ asymptotically as $x\to \pm \infty$.
$f_X'(x) = 0$ when $x = \mu$, and a maximum occurs there.
$f_X''(x) = 0$ when $x = \mu \pm \sigma$ (points of inflection).

The proof that $\displaystyle \int^\infty_{-\infty} f_X(x)\,dx = 1$ is not obvious so will not be given. The proof relies on first squaring the integral and then changing to polar coordinates.

Definition 8.4 (Normal distribution: distribution function) For a random variable $X$ with the normal distribution given in Eq. (8.2), the distribution function is \[\begin{align} F_X(x; \mu, \sigma^2) &= \frac{1}{\sigma \sqrt{2\pi}} \int_{-\infty}^x \exp\left\{\frac{(t - \mu)^2}{2\sigma^2}\right\}\, dt\\ &= \frac{1}{2} \left\{1 + \operatorname{erf} \left( \frac{x - \mu}{\sigma {\sqrt{2}} }\right)\right\} \end{align}\] where $\operatorname{erf}(\cdot)$ is the error function \[\begin{equation} \operatorname{erf}(x) = \frac{2}{\sqrt{\pi}} \int _{0}^{x} \exp\left( -t^{2}\right)\,dt, \tag{8.3} \end{equation}\] for $x \in\mathbb{R}$. The function $\operatorname{erf}(\cdot)$ appears in numerous places, is commonly tabulated, and available in many computer packages.

The following are the basic properties of the normal distribution.

Theorem 8.2 (Normal distribution properties) If $X\sim N(\mu,\sigma^2)$ then

$\operatorname{E}[X] = \mu$.
$\operatorname{var}[X] = \sigma^2$.
$M_X(t) = \displaystyle \exp\left(\mu t + \frac{t^2\sigma^2}{2}\right)$.

Proof. The proof is delayed until after Theorem 8.3.

The four R functions for working with the normal distribution have the form [dpqr]norm(mean, sd) (see App. D). Note that the normal distribution in R is specified by giving the standard deviation and not the variance.

The error function $\operatorname{erf}(\cdot)$ is not available directly in R, but can be evaluated using \[ \operatorname{erf}(x) = 2\times \texttt{pnorm}(x\sqrt{2}) - 1. \]

8.3.2 The standard normal distribution

A special case of the normal distribution is the standard normal distribution, when the normal distribution has mean zero and variance one.

Definition 8.5 (Standardard normal distribution) The PDF for a random variable $Z$ with a standard normal distribution, sometimes denoted $\phi(x)$, is \[\begin{equation} f_Z(z) = \displaystyle \frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{z^2}{2}\right\} \tag{8.4} \end{equation}\] where $-\infty < z < \infty$. We write $Z\sim N(0, 1)$.

Since $f_Z(z)$ is a PDF, then \[\begin{equation} \frac 1{\sqrt{2\pi}}\int^\infty_{-\infty} \exp\left\{-\frac{1}{2} z^2\right\}\,dz = 1, \tag{8.5} \end{equation}\] a result which proves useful in many contexts (as in the proof of the second statement in Theorem 7.7, and in the proof below).

Definition 8.6 (Standard normal distribution: distribution function) For a random variable $X$ with the standard normal distribution given in Eq. (8.4), the distribution function is \[\begin{align} F_X(x) &= \frac{1}{2} \left\{1 + \operatorname{erf} \left( \frac {x}{\sqrt{2}} \right)\right\}\nonumber\\ &= \int_{-\infty}^z \frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{t^2}{2}\right\}\, dt \\ &= \Phi(z) \tag{8.6} \end{align}\] where $\operatorname{erf}(\cdot)$ is the error function (8.3).

Definition 8.7 (Notation) The density and distribution functions for the standard normal distribution are used so often that they have their own notation. The density function is denoted $\phi(\cdot)$, and the distribution function is denoted $\Phi(\cdot)$. Then, the density function a normal distribution with mean $\mu$ and standard deviation $\sigma$ is $\phi\big( (x - \mu)/\sigma\big)$, and the corresponding distribution function is $\Phi\big( (x - \mu)/\sigma\big)$.

Note: $\displaystyle\frac{d}{dx} \Phi(x) = \phi(x)$.

The following are the basic properties of the standard normal distribution.

Theorem 8.3 (Standard normal distribution properties) If $Z\sim N(0, 1)$ then

$\operatorname{E}[Z] = 0$.
$\operatorname{var}[Z] = 1$.
$M_Z(t) = \displaystyle \exp\left(\frac{t^2}{2}\right)$.

Proof. Part 3 could be proven first, and used to prove Parts 1 and 2. However, proving Parts 1 and 2 directly is constructive. For the expected value: \[\begin{align*} \operatorname{E}[Z] &= \frac {1}{\sqrt{2\pi}} \int^\infty_{-\infty} z \exp\left\{-\frac{1}{2}z^2 \right\}\,dz\\ &= \int^\infty_{-\infty} -\frac{d}{dz} \left(\exp\left\{-\frac{1}{2}z^2\right\} \right)\, dz\\ &= \left[ -\exp\left\{-\frac{1}{2}z^2\right\}\right]^\infty_{-\infty} = 0, \end{align*}\] since the integrand is symmetric about $0$.

For the variance, first see that $\operatorname{var}[Z] = \operatorname{E}[Z^2] - \operatorname{E}[Z]^2 = \operatorname{E}[Z^2]$ since $\operatorname{E}[Z] = 0$. So: \[ \operatorname{var}[X] = \operatorname{E}[Z^2] = \frac {1}{\sqrt{2\pi}}\int^\infty_{-\infty}z^2e^{-\frac{1}{2}z^2}\,dz. \] To integrate by parts (i.e., $\int u\,dv = uv - \int v\, du$), set $u = z$ (so that $du = 1$) and set \[ dv = ze^{-\frac{1}{2}z^2} \quad\text{so that}\quad v = -e^{-\frac{1}{2}z^2}. \] Hence, \[\begin{align*} \operatorname{var}[X] &= \frac {1}{\sqrt{2\pi}} \left\{ -z \exp\left\{-\frac{1}{2}z^2\right\} - \int^\infty_{-\infty}-\exp\left\{-\frac{1}{2}z^2\right\}\, dz \right\}\\ &= \frac {1}{\sqrt{2\pi}}\left(\left. -z\,\exp\left\{-\frac{1}{2} z^2\right\}\right|^\infty_{-\infty}\right) + \frac{1}{\sqrt{2\pi}}\int^\infty_{-\infty} \exp\left\{-\frac{1}{2}z^2\right\}\,dz = 1 \end{align*}\] since the first term is zero, and the second term uses Eq. (8.5).

For the MGF: \[ M_Z(t) = \operatorname{E}[\exp\left\{tZ\right\}] = \int^\infty_{-\infty} \exp\left\{tz\right\} \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2}z^2\right\}\,dz. \] Collecting together the terms in the exponent and completing the square, \[\begin{equation*} -\frac{1}{2}[z^2 -2tz] = -\frac{1}{2}(z - t)^2+\frac{1}{2} t^2. \end{equation*}\] Taking the constants outside the integral: \[ M_Z(t) = \exp\left\{\frac{1}{2}t^2\right\}\int^\infty_{-\infty} \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{1}{2}[z - t]^2\right\}\,dz. \] The integral here is $1$, since it is the area under an $N(t, 1)$ PDF. Hence \[ M_Z(t) = \exp\left\{\frac{1}{2} t^2\right\}. \]

This distribution is important in practice since any normal distribution can be rescaled into a standard normal distribution using \[\begin{equation} Z = \frac{X - \mu}{\sigma}. \tag{8.7} \end{equation}\] Since $Z = (X - \mu)/\sigma$, then $X = \mu + \sigma Z$, and so \[ \operatorname{E}[X] = \operatorname{E}[\mu + \sigma Z] = \mu + \sigma \operatorname{E}[Z] = \mu \] because $\operatorname{E}[Z] = 0$. Also \[ \operatorname{var}[X] = \operatorname{var}[\mu +\sigma Z] = \sigma^2\operatorname{var}[Z] = \sigma^2 \] because $\operatorname{var}[Z] = 1$. Finally \[ M_X(t) = \operatorname{E}[e^{tX}] = \operatorname{E}\big[\exp\{t(\mu + \sigma Z)\}\big] = \exp(\mu t)\operatorname{E}\big[\exp(t\sigma Z)\big]. \] However, $\operatorname{E}\big[\exp(t\sigma Z)\big] = M_Z(t\sigma) = \exp\left\{\frac{1}{2}(\sigma t)^2\right\}$ so \[ M_Z(t) = \displaystyle \exp\left(\frac{t^2}{2}\right). \]

The four R functions for working with the normal distribution have the form [dpqr]norm(mean, sd) (see App. D). By default, mean = 0 and sd = 1, corresponding to the standard normal distribution. Note that the normal distribution is specified by giving the standard deviation and not the variance.

8.3.3 Determining normal probabilities

The probability $\Pr(a < X \le b)$ where $X\sim N(\mu, \sigma^2)$ can be written \[ \Pr(a < X \le b) = F_X(b) - F_X(a). \] where $F_X(x)$ is the distribution function in Def. 8.6. The integral in the distribution function cannot be written in terms of standard functions and in general must be evaluated for a particular $x$ numerically (e.g., using Simpson’s rule or similar). However, all statistical packages have built-in procedures to evaluate $F_X(x)$ for any $z$ (such as pnorm() in R).

Also, since any normal distribution can be transformed into a standard normal distribution, we can write \[ z_1 = (a - \mu)/\sigma \quad\text{and}\quad z_2 = (b - \mu)/\sigma \] and hence the probability is \[ \Pr( z_1 < Z \le z_2) = \Phi(z_2) - \Phi(z_1), \] where $\Phi(z)$ is the distribution function for the standard normal distribution, as in Eq. (8.3).

The process of converting a value $x$ into $z$ using $z = (x - \mu)/\sigma$ is called standardising. Tables of $\Phi(z)$ (or sometimes $1 - \Phi(z)$) are commonly available. These tables can be used to compute any probabilities associated with the normal distributions (see the examples below). Of course, R can be used too.

In addition, the tables are often used in the reverse sense, where the probability of an event is given and the value of the random variable is sought. In these case, the tables are used ‘backwards’; the appropriate area is found in the body of the table and the corresponding $z$-value found in the table margins. In R, the function qnorm() is used. This $z$-value is then converted to a value of the original random variable using \[\begin{equation} x = \mu + z\sigma. \tag{8.8} \end{equation}\] This process is sometimes referred to as unstandardising.

The following examples illustrate the use of R. Drawing rough graphs showing the relevant areas is encouraged.

Example 8.2 (Walking speeds) A study of stadium evacuations used a simulation to compare scenarios (H. Xie, Weerasekara, and Issa 2017). The walking speed of people was modelled using a $N(1.15, 0.2^2)$ distribution; that is, $\mu = 1.15$ m/s with a standard deviation of $0.2$ m/s. The situation can be shown in Fig. 8.3, top left panel.

Many questions can be asked about the walking speeds, and R used to compute answers. Using this model:

What is the probability that a person walks faster than $1.5$ m/s?
What is the probability that a person walks slower than $1.0$ m/s?
What is the probability that a person walks between $1.0$ m/s and $1.5$ m/s?
At what speed do the slowest $15$% of people walk?
At what speed do the quickest $5$% of people walk?

Sketches of each of these situations are shown in Fig. 8.3. Using R:

# Part 1
1 - pnorm(1.5, mean = 1.15, sd = 0.2)
#> [1] 0.04005916

# Part 2
pnorm(1, mean = 1.15, sd = 0.2)
#> [1] 0.2266274

# Part 3
pnorm(1.5, mean = 1.15, sd = 0.2) - 
   pnorm(1, mean = 1.15, sd = 0.2)
#> [1] 0.7333135

# Part 4
qnorm(0.15, mean= 1.15, sd = 0.2)
#> [1] 0.9427133

# Part 5
qnorm(0.95, mean = 1.15, sd = 0.2)
#> [1] 1.478971

So the answers are, respectively:

4%;
22.7%;
73.3%;
0.894 m/s;
1.479m/s.

$The normal distribution used for modelling walking speeds. The vertical dotted lines are at the mean and $\pm1$, $\pm2$, $\pm3$ and $\pm4$ standard deviations from the mean.$

FIGURE 8.3: The normal distribution used for modelling walking speeds. The vertical dotted lines are at the mean and $\pm1$, $\pm2$, $\pm3$ and $\pm4$ standard deviations from the mean.

Example 8.3 (Using normal distributions) Scores on an examination have a normal distribution, with a mean score of $50$ and standard deviation of $10$. Suppose the top $75$% of candidates taking this examination are to be passed; call the minimum passing score $x^*$.

Since $X\sim N(50, 10^2)$: \[\begin{align*} \Pr(X > x^*) &= 0.75\\ \Pr\left(Z > \frac{x^* - 50}{10}\right) &= 0.75\\ \Pr\left(Z < \frac{50 - x^*}{10}\right) & = 0.75. \end{align*}\] From tables, $0.75 = \Phi(0.675)$ so $(50 - x^*)/10 = 0.675$ and $x^* = 43.25$. Using R:

qnorm(0.25, # 'Top 75%' is the same as the 'bottom 25%' 
      mean = 50,
      sd = 10)
#> [1] 43.2551

8.3.4 Normal approximation to the binomial

In Sect. 7.4, the binomial distribution was considered. Sometimes using the binomial distribution is tedious; consider a binomial random variable $X$ where $n = 1000$, $p = 0.45$ and $\Pr(X > 650)$ is sought: we would calculate $\Pr(X = 651) + \Pr(X = 652) + \cdots + \Pr(X = 1000)$.

However, sometimes the normal distribution can be used to approximate binomial probabilities. For certain parameter values, the binomial pf starts to take on a normal distribution shape (Fig. 8.4).

When is the binomial probability function close enough to use the normal approximation? There is no definitive answer; a common guideline suggests that if both $np \ge 5$ and $n(1 - p) \ge 5$ the approximation is satisfactory. (These are only guidelines, and other texts may suggest different guidelines.)

Figure 8.4 shows some picture of various binomial probability functions overlaid with the corresponding normal distribution; the approximation is visibly better as the guidelines given above are satisfied.

$The normal distribution approximating a binomial distribution. The guidelines suggest the approximation should be good when $np \ge 5$ and $n (1 - p) \ge 5$; this is evident from the pictures. In the top row, a significant amount of the approximating normal distribution even appears when $Y < 0$.$

FIGURE 8.4: The normal distribution approximating a binomial distribution. The guidelines suggest the approximation should be good when $np \ge 5$ and $n (1 - p) \ge 5$; this is evident from the pictures. In the top row, a significant amount of the approximating normal distribution even appears when $Y < 0$.

The normal distribution can be used to approximate probabilities in situations that are actually binomial. A fundamental difficulty with this approach is that a discrete distribution is being modelled with a continuous distribution. This is best explained through an example. The example explains the principle; the idea extends to all situations where the normal distribution is used to approximate a binomial distribution.

Example 8.4 (Normal approximation to binomial) Consider mdx mice (which have a strain of muscular dystrophy) from a particular source for which $30$% of the mice survive for at least $40$ weeks. One particular experiment requires at least $35$ of the $100$ mice to live beyond $40$ weeks. What is the probability that $35$ or more of the group will survive beyond $40$ weeks?

First see that the situation is binomial; if $X$ is the number of mice from the group of $100$ that survive, then $X \sim \text{Bin}(100, 0.3)$. This could be approximated by the normal distribution $Y\sim N(30, 21)$, where the variance is $np(1 - p) = 100\times 0.3\times 0.7 = 21$. Both $np = 30$ and $n(1 - p) = 70$ are much larger than $5$, so this approximation is expected to adequate.

Figure 8.5 shows the upper tail of the distribution near $X = 35$: using the normal approximation from $Y = 35$, only half of the original bar in the binomial pf is included. However, since the number of mice is discrete, we want the entire bar corresponding to $X = 35$. So to compute the correct answer, the normal distribution must be evaluated for $\Pr(Y > 34.5)$. This change from $Y \ge 34.5$ to $Y > 34.5$ is called using the continuity correction.

The exact answer (using the binomial distribution) is $0.1629$ (rounded to four decimal places). Using the normal distribution with the continuity correction gives the answer as $0.1631$; using the normal distribution without the continuity correction, the answer is $0.1376$. The solution is more accurate, as expected, using the continuity correction.

# Exact
ExactP <- sum( dbinom(x = 35:100,
                      size = 100,
                      prob = 0.30))
ExactP2 <- 1 - pbinom(34, 
                      size = 100,
                      prob = 0.30)
# Normal approx
NormalP <- 1 - pnorm(35, 
                     mean = 30,
                     sd = sqrt(21))
# Normal approx with continuity correction
ContCorrP <- 1 - pnorm(34.5, 
                       mean = 30,
                       sd = sqrt(21))
cat("Exact:" = round(ExactP, 6), "\n") # \n means "new line" 
#> 0.162858
cat("Exact (alt):" = round(ExactP2, 6), "\n")
#> 0.162858
cat("Normal approx:" = round(NormalP, 6), "\n")
#> 0.137617
cat("With correction:" = round(ContCorrP, 6), "\n")
#> 0.163055

FIGURE 8.5: The normal distribution approximating the binomial distribution when $n = 100$, $p = 0.3$ and finding the probability that $X > 35$.

Example 8.5 (Continuity correction) Consider rolling a standard die $100$ times, and counting the number of times a appear uppermost. The random variable $X$ is the number of times a appears; then, $X\sim \text{Bin}(n = 100, p = 1/6)$.

Since $np = 16.667$ and $n(1 - p) = 83.333$ are both greater than $5$, a normal approximation should be accurate, so define $Y\sim N(16.6667, 13.889)$ (where the variance is $np(1 - p) = 13.889$). Various probabilities (Table 8.1) show the accuracy of the approximation, and the way in which the continuity correction has been used.

You should understand how the concept of the continuity correction has been applied in each situation, and be able to compute the probabilities for the normal approximation.

TABLE 8.1: Some events and their probabilities for the die-rolling example, computed using the binomial distribution (exact) and the normal approximation using the continuity correction. The accuracy of the approximation is generally very good.
Event (binomial)	Prob (binomial)	Event (normal)	Prob (normal)
$\Pr(X \lt 10)$	0.0213	$\Pr(Y \lt 9.5)$	0.0368
$\Pr(X \le 15)$	0.3877	$\Pr(Y \lt 15.5)$	0.3771
$\Pr(X \gt 17)$	0.4006	$\Pr(Y \gt 17.5)$	0.4115
$\Pr(X \ge 21)$	0.1519	$\Pr(Y \gt 20.5)$	0.1518
$\Pr(15 \lt X \le 17)$	0.2117	$\Pr(15.5 \lt Y \lt 17.5)$	0.2113

8.4 Exponential distribution

The normal distribution is defined for all real values, but many quantities are defined on the positive real numbers only. This is not always a problem, especially when the observations are far from zero, such as heights of adult humans. However, many random variables have observations close to zero, and the data are often skewed to the right (positively skewed).

Many distributions can be used for modelling right-skewed data defined on the positive real numbers; the simplest is the exponential distribution.

8.4.1 Derivation

The exponential distribution is closely related to the Poisson distribution. To show this, consider a Poisson process, where events occur at an average rate of $\lambda$ events per unit time (e.g., $\lambda = 0.5$ per minute).

Then consider a fixed length of time $t > 0$; then, the number of occurrences $Y$ would have an average rate of $\lambda t$ (e.g., in 12 minutes, we’d expect an average rate of $0.5\times 12 = 6$ occurrences). That is, the number of events occurring in the interval $(0, t)$ has Poisson distribution with mean $\lambda t$, with probability function \[\begin{equation} p_Y(y) = \frac{\exp(-\lambda t) (\lambda t)^y}{y!} \quad \text{for $y = 0, 1, 2, \dots$}. \tag{8.9} \end{equation}\] Hence, the probability that zero events occur in the interval $(0, t)$ is \[ \Pr(Y = 0) = \frac{\exp(-\lambda t) (\lambda t)^0}{0!} = \exp(-\lambda t) \] using $Y = 0$ in (8.9).

Now, introduce a new random variable $T$ that represent the time until the first event occurs. Then, $Y = 0$ (i.e., no events occur in the interval $(0, t)$) corresponds to $T > t$ (i.e., the time till the first event occurs must be great than the time interval $t$). That is, \[ \Pr(Y = 0) = \exp(-\lambda t) = \Pr(T > t). \] Rewriting in terms of a distribution function for $T$, \[ F_T(t) = 1 - \Pr(T > t) = 1 - \exp(-\lambda t) \] for $t > 0$. Hence, the probability density function is \[ f_T(t) = \lambda \exp(-\lambda t) \] for $t > 0$. This is the probability density function for the exponential distribution.

This shows that if events occur at random according to a Poisson distribution, then the time (or the space) between those events follows an exponential distribution (Theorem 8.4). Hence the exponential distribution is used to describe the interval between consecutive randomly occurring events that follow a Poisson distribution.

Theorem 8.4 (Poisson process and exponential distributions) Consider a Poisson process at rate $\lambda$ and suppose observation starts at an arbitrary time point. Then the time $T$ to the first event has an exponential distribution with mean $\operatorname{E}[T] = 1/\lambda$; i.e., \[ f_T(t) = \lambda e^{-\lambda t},\quad t > 0. \]

Although the theorem refers to ‘time’, the variable of interest may be distance or any other continuous variable measuring the interval between events.

8.4.2 Definition and properties

The exponential distribution is usually written as follows.

Definition 8.8 (Exponential distribution) If a random variable $X$ has the PDF \[\begin{equation} f_X(x; \beta) = \displaystyle \frac{1}{\beta} \exp(-x/\beta) \quad \text{for $x > 0$} \tag{8.10} \end{equation}\] then $X$ has an exponential distribution with parameter $\beta > 0$. We write $X\sim\text{Exp}(\beta)$.

The parameter $\lambda = 1/\beta$ is often used in place of $\beta$, and is called the rate parameter. Sometimes the notation $X\sim\text{Exp}(\lambda)$, so checking which notation is being used in any context is wise. Plots of the PDF for various exponential distributions are given in Fig. 8.6}.

$Exponential distributions for various values of the rate parameter $\lambda = 1/\beta$.$

FIGURE 8.6: Exponential distributions for various values of the rate parameter $\lambda = 1/\beta$.

Definition 8.9 (Exponential distribution: distribution function) For a random variable $X$ with the exponential distribution given in Eq. (8.10), the distribution function is \[ F_X(x; \lambda) = 1 - \exp\left\{-x/\beta\right\} = 1 - \exp\left\{-\lambda x\right\} \] for $x > 0$.

The following are the basic properties of the exponential distribution.

Theorem 8.5 (Exponential distribution properties) If $X\sim\text{Exp}(\beta)$ then

$\operatorname{E}[X] = \beta = 1/\lambda$.
$\operatorname{var}[X] = \beta^2 = 1/\lambda^2$.
$M_X(t) = (1 - \beta t)^{-1}$ for $t < 1/\beta$ (or, $M_X(t) = \lambda/(\lambda - t)$ for $t < \lambda$).

Proof. The proofs are left as an exercise.

The parameter $\beta$ represents the mean of the exponential distribution. The alternative parameter $\lambda = 1/\beta$ represents the mean rate at which events occur. Like the Poisson distribution, the variance is defined once the mean is defined .

The four R functions for working with the exponential distribution have the form [dpqr]exp(rate) where rate${}= \lambda = 1/\beta$ (see App. D).

Example 8.6 (Exponential distributions) Allen et al. (1975) use the exponential distribution to model daily rainfall in Kentucky.

Example 8.7 (Exponential distributions) Cox and Lewis (1966) give data collected by Fatt and Katz concerning the time intervals between successive nerve pulses along a nerve fibre. There are $799$ observations which we do not give here. The mean time between pulses is $\beta = 0.2186$ seconds. An exponential distribution might be expected to model the data well. This is indeed the case (Fig. 8.7).

Define $X$ as the time between successive nerve pulses (in seconds); then $X\sim \text{Exp}(\beta = 0.2186)$ (so the rate parameter is $\lambda = 1/0.2186 = 4.575$ per second). To find the proportion of time intervals longer than $1$ second: \[\begin{align*} \Pr(X > 1) &= \int_1^\infty \frac{1}{0.2186}\exp(-x/0.2186)\, dx \\ &= -\exp(-x/0.2186)\Big|_1^\infty \\ &= (-0) + (\exp\{-1/0.2186\})\\ &= 0.01031. \end{align*}\] There is about a $1$% chance of a nerve pulse exceeding one second.

FIGURE 8.7: The time between successive nerve pulses. An exponential distribution fits well.

The relationship between the Poisson and exponential distribution was explored in Sect. 8.4.1 (see, for example, Theorem 8.4). The next example explores this relationship

Example 8.8 (Relationship between exponential and Poisson distributions) Suppose a Poisson process occurs at the mean rate of $5$ events per hour. Let $N$ represent the number of events in one day and $T$ the time between consecutive events. We can describe the distribution of the time between consecutive events and the distribution of the number of events in one day ($24\,\text{h}$).

Since events occur at the mean rate of $\lambda = 5$ events per hour, the mean time between consecutive events is $\beta = 1/\lambda = 0.2\,\text{h}$. Hence, the mean number of events in one day is $\mu = 24\times 5 = 120$.

Consequently, $N\sim\text{Pois}(\mu = 120)$ and $X\sim\text{Exp}(\beta = 0.2)$ (or, equivalently, $X\sim\text{Exp}(\lambda = 5)$).

An important feature of a Poisson process, and hence of the exponential distribution, is the memoryless or Markov property: the future of the process at any time point does not depend on the history of the process. This property is captured in the following theorem.

Theorem 8.6 (Memoryless property) If $T \sim \text{Exp}(\lambda)$ where $\lambda$ is the rate parameter, then for $s > 0$ and $t > 0$, \[ \Pr(T > s + t \mid T > s) = \Pr(T > t) \]

Proof. Using Definition 2.13, \[ \Pr(T > s + t \mid T > s) = \frac{ \Pr( \{T > s + t\} \cap \{T > s\})} {\Pr(T > s)} \] But if $T > s + t$, then $T > s$. Consequently $\Pr( \{T > s + t \}\cap \{T > s \} ) = \Pr(T > s + t)$ and so \[\begin{align*} \Pr(T > s + t \mid T > s ) &= \frac{\Pr(T > s +t )}{\Pr(T > s)}\\ &= \frac{\exp\left\{-\lambda(s + t)\right\}}{\exp\left\{-\lambda s\right\}}\\ &= \exp\left\{-\lambda t\right\}\\ &= \Pr(T > t). \end{align*}\]

This theorem states that the probability that the time to the next event is greater than $t$ does not depend on the time $s$ back to the previous event. This is called the memoryless property of the exponential distribution.

Example 8.9 (Memoryless property of exponential distribution) Suppose the lifespan of component $A$ is modelled by an exponential distribution with mean $12$ months. Then, $T \sim \text{Exp}(\beta = 6)$.

The probability that component $A$ fails in less than $6$ months is \[ \Pr(T < 6) = 1 - \exp(-6/12) = 0.3935. \]

Now, suppose component $A$ has been in place for $12$ months. The probability that it will fail in less than a further $6$ months is \[ \Pr(T < 18 \mid T > 12) = \Pr(T < 6) = 1 - \exp(-6/12) = 0.3935 \] by the memoryless property.

Example 8.9 shows that an exponential process is ‘ageless’: the risk of ‘mortality’ remains constant with age. That is, the probability of such an event occurring in the next small interval, whether the failure of a component or the occurrence of an accident, remains constant regardless of the age of the component or the length of time since the last accident. In this sense, an exponential lifetime is different from a human lifetime, or the lifetime of many man-made objects, where the risk of ‘death’ in the next small interval increases with age.

8.5 Gamma distribution

Once the mean of an exponential distribution is defined, the variance is defined. More flexibility is sometimes needed, which is provided by the gamma distribution.

Definition 8.10 (Gamma distribution) If a random variable $X$ has the PDF \[ f_X(x; \alpha, \beta) = \frac{1}{\beta^\alpha \Gamma(\alpha)} x^{\alpha - 1} \exp(-x/\beta) \] then $X$ has a gamma distribution, where $\Gamma(\cdot)$ is the gamma function (see Sect. 7.11) and $\alpha, \beta > 0$. We write $X \sim \text{Gam}(\alpha, \beta)$.

The parameter $\alpha$ is called the shape parameter and $\beta$ is called the scale parameter. Some texts use different notation for the shape and scale parameters. In broad terms, the shape parameter dictates the general shape of the distribution; the scale parameter dictates how ‘stretched out’ the distribution is.

In R, the gamma function $\Gamma(x)$ is evaluated using gamma(x).

The four R functions for working with the gamma distribution have the form [dpqr]gamma(shape, rate, scale = 1/rate), where shape${}= \alpha$ and scale${}= \beta$ (see App. D).

Plots of the gamma PDF for various values of the parameters are given in Fig. 8.8. In the bottom right panel, the gamma distributions are starting to look a bit like normal distributions.

$The PDF of a gamma distribution for various values of $\alpha$ and $\beta$.$

FIGURE 8.8: The PDF of a gamma distribution for various values of $\alpha$ and $\beta$.

Notice that \[\begin{align} \int_0^\infty f_X(x)\,dx &= \int_0^\infty\frac{e^{-x/\beta}x^{\alpha - 1}}{\beta^\alpha\Gamma(\alpha)}\,dx\nonumber\\ &= \frac{1}{\Gamma(\alpha)} \int_0^\infty e^{-y} y^{\alpha-1}\,dy \quad \text{(on putting $y = x/\beta$)}\nonumber \\ &= 1, \tag{8.11} \end{align}\] because $\int_0^\infty \exp(-y) y^{\alpha-1}\,dy = \Gamma(\alpha)$.

The distribution function for the gamma distribution is complicated, involving incomplete gamma functions, and will not be given. The following are the basic properties of the gamma distribution.

Theorem 8.7 (Gamma distribution properties) If $X\sim\text{Gam}(\alpha,\beta)$ then

$\operatorname{E}[X] = \alpha\beta$.
$\operatorname{var}[X] = \alpha\beta^2$.
$M_X(t) = (1-\beta t)^{-\alpha}$ for $t < 1/\beta$.

Proof. For the expected value: \[ \operatorname{E}[X] = \int_0^{\infty}x\, f_X(x)\,dx = \beta \frac{\Gamma(\alpha + 1)}{\Gamma(\alpha)} \underbrace{\int_0^{\infty}\frac{\exp(-x/\beta) x^{(\alpha + 1) - 1}}{\beta^{\alpha + 1}\Gamma(\alpha + 1)} \, dx}_{= 1} = \alpha\beta. \] This result follows from using Eq. (8.11) and Theorem 8.7 \[ \operatorname{E}[X^2] = \int_0^{\infty}x^2\, f_X(x) \, dx = \beta^2\frac{\Gamma(\alpha + 2)}{\Gamma(\alpha)} \underbrace{\int_0^{\infty}\frac{\exp(-x/\beta) x^{(\alpha + 2) - 1}}{\beta^{\alpha + 2}\Gamma(\alpha + 2)}\,dx}_{= 1} = \alpha(\alpha + 1)\beta^2 \] where the result follows by writing $\Gamma(\alpha + 2) = (\alpha + 1)\alpha\Gamma(\alpha)$. Hence \[ \operatorname{var}[X] = \operatorname{E}[X^2] - (\operatorname{E}[X])^2 = \alpha(\alpha + 1)\beta^2 - (\alpha\beta)^2 = \alpha\beta^2 \] Also: \[\begin{align*} M_X(t) = \operatorname{E}[e^{Xt}] &= \int_0^{\infty}\exp(tx) \frac{\exp(-x/\beta) x^{\alpha - 1}}{\beta^{\alpha} \Gamma(\alpha)} \, dx\\ &= \int_0^{\infty} \frac{\exp\{-x(1 - \beta t)/\beta\}x^{\alpha - 1}}{\beta^{\alpha}\Gamma(\alpha)} \, dx\\ &= \int_0^{\infty} \frac{\exp(-z) z^{\alpha - 1}}{\Gamma(\alpha)(1 - \beta t)^{\alpha - 1}} \, \frac{dz} {1 - \beta t}, \text{putting $z = x(1 - \beta t)/\beta$}\\ &= (1 - \beta t)^{-\alpha}, \end{align*}\] since the integral remaining is $1$.

As usual the moments can be found by expanding $M_X(t)$ as a series. That is, \[ M_X(t) = 1 \ + \ \alpha\beta t \ + \ \frac{\alpha(\alpha+1)\beta^2}{2!} t^2 \ + \ \cdots \] from which \[\begin{align*} \operatorname{E}[X] &= \text{coefficient of }t =\alpha\beta,\\ \operatorname{E}[X^2] &= \text{coefficient of }t^2/2!=\alpha(\alpha+1)\beta^2, \end{align*}\] as found earlier.

As for the normal distribution, the distribution function of the gamma cannot, in general, be computed without using numerical integration, tables (although see Example 8.11) or software.

Example 8.10 (Rainfall) Das (1955) used a (truncated) gamma distribution for modelling daily precipitation in Sydney. A similar approach is adopted by D. S. Wilks (1990).

Larsen and Marx (1986) (Case Study 4.6.1) use the gamma distribution to model daily rainfall in Sydney, Australia using the parameter estimates $\alpha = 0.105$ and $\beta = 76.9$ (based on Das (1955)). The comparison between the data and the model (Table 8.2) indicates a good agreement between the data and the theoretical distribution.

TABLE 8.2: The gamma distribution used to model Sydney daily rainfall.
Rainfall (mm)	Observed	Modelled	Rainfall (mm)	Observed	Modelled
0–5	1631	1639	46–50	18	12
6–10	115	106	51–60	18	20
11–15	67	62	61–70	13	15
16–20	42	44	71–80	13	12
21–25	27	32	81–90	8	9
26–30	26	26	91–100	8	7
31–35	19	21	101–125	16	12
36–40	14	17	126–150	7	7
41–45	12	14	151–425	14	13

Example 8.11 (Electrical components) The lifetime of an electrical component in hours, say $T$, can be well modelled by the distribution $\text{Gam}(2, 1)$. What is the probability that a component will last for more than three hours?

From the information, $T\sim \text{Gam}(\alpha = 2, \beta = 1)$. The required probability is therefore \[\begin{align*} \Pr(T > 3) &= \int_3^\infty \frac{1}{1^2 \Gamma(2)}t^{2 - 1} \exp(-t/1)\,dt \\ &= \int_3^\infty t \exp(-t)\,dt \\ \end{align*}\] since $\Gamma(2) = 1! = 1$. This expression can be integrated using integration by parts: \[\begin{align*} \Pr(T > 3) &= \int_3^\infty t \exp(-t)\,dt \\ &= \left\{ -t \exp(-t)\right\}\Big|_3^\infty - \int_3^\infty -\exp(-t)\, dt \\ &= [ (0) - \{-3\exp(-3)\}] - \left\{ \exp(-t)\Big|_3^\infty\right\} \\ &= 3\exp(-3) + \exp(-3)\\ &= 0.1991 \end{align*}\] Integration by parts is only possible since $\alpha$ is an integer. If $\alpha = 2.5$, for example, integration by parts would not be possible.

A more general approach is to use tables of the incomplete gamma function to evaluate the integral, numerical integration, or software. To use R:

# Integrate
integrate(dgamma, # The gamma distribution prob. fn
          lower = 3,
          upper = Inf,  # "Inf" means "infinity"
          shape = 2,
          scale = 1)
#> 0.1991483 with absolute error < 9.3e-05

# Directly
1 - pgamma(3,
           shape = 2,
           scale = 1)
#> [1] 0.1991483

Using any method, the probability is about 20%.

Example 8.12 (Gamma) If $X\sim \text{Gam}(\alpha, \beta)$, find the distribution of $Y = kX$ for some constant $k$.

One way to approach this question is to use moment-generating functions. Since $X$ has a gamma distribution, $M_X(t) = (1-\beta t)^{-\alpha}$. Now, \[\begin{align*} M_Y(t) &= \operatorname{E}[\exp(tY)] \qquad\text{by definition of the MGF}\\ &= \operatorname{E}[\exp(t kX)] \qquad\text{since $X = kX$}\\ &= \operatorname{E}[\exp(s X)] \qquad\text{by letting $s = kt$}\\ &= M_X(s) \qquad\text{by definition of the MGF}\\ &= M_X(kt) \\ &= (1 - \beta kt)^{-\alpha}. \end{align*}\] This is just the MGF for random variable $X$ with $k\beta$ in place of $\beta$, so the distribution of $Y$ is $\text{Gam}(\alpha, k\beta)$.

8.6 Beta distribution

Some continuous random variables are constrained to a finite interval. The beta distribution is useful in these situations.

Definition 8.11 (Beta distribution) A random variable $X$ with probability density function \[\begin{equation*} f_X(x; m, n) = \frac{x ^ {m - 1}(1 - x)^{n - 1}}{B(m, n)} \quad \text{for $0\leq x \leq 1$ and $m > 0$, $n > 0$}, \end{equation*}\] and $B(m, n)$ is the beta function \[\begin{align} B(m, n) &= \int_0^1 x^{m - 1}(1 - x)^{n - 1} \, dx, \quad \text{for $m > 0$ and $n > 0$}\notag\\ &= \frac{\Gamma(m)\, \Gamma(n)}{\Gamma(m + n)}, \tag{8.12} \end{align}\] is said to have a beta distribution with parameters $m$ and $n$. We write $X \sim \text{Beta}(m, n)$.

$B(m, n)$ defined by (8.12) is known as the beta function with parameters $m$ and $n$. Since $\int_0^1 f_X(x)\,dx = 1$ then \[ \int_0^1 \frac{x^{m - 1}(1 - x)^{n - 1}}{B(m, n)}\,dx = 1. \]

In R, the beta function is evaluated using beta(a, b), where a${} = m$ and b${} = n$.

The four R functions for working with the beta distribution have the form [dpqr]beta(shape1, shape2), where shape1${}= m$ and shape2${}= n$ (see App. D).

The distribution function for the beta distribution is complicated, involving incomplete beta functions, and will not be given. Some properties of the beta function follow.

Theorem 8.8 (Beta function properties) The beta function in Eq. (8.12) satisfies the following:

The beta function is symmetric in $m$ and $n$. That is, if $m$ and $n$ are interchanged, the function remains unaltered; i.e., $B(m, n) = B (n, m)$.
$B(1, 1) = 1$.
$B\left(\frac{1}{2}, \frac{1}{2}\right) = \pi$.

Proof. To prove the first, put $z = 1 - x$ and hence $dz = -dx$ in Eq. (8.12). Then \[\begin{align*} B(m, n) &= -\int_1^0(1 - z)^{m - 1} z^{n - 1} \, dz\\ &= \int_0^1 z^{n - 1}(1 - z)^{m - 1} \, dz\\ &= B(n, m). \end{align*}\]

For the second, put $x = \sin^2\theta$, and so $dx = 2\sin\theta\cos\theta\,d\theta$, in Eq. (8.12). We have \[ B(m, n) = 2 \int_0^{\pi/2} \sin^{2m - 1}\theta \cos ^{2n - 1}\theta\,d\theta. \] So, for $m = n = \frac{1}{2}$, \[ B\left(\frac{1}{2}, \frac{1}{2}\right) = 2\int_0^{\pi/2} d\theta = \pi. \]

For the third, note that $\Gamma(1/2) = \sqrt{\pi}$ from Theorem 8.7. Further, since $\Gamma(1) = 0! = 1$, the results follows.

Typical graphs for the beta PDF are given in Fig. 8.9. When $m = n$, the distribution is symmetric about $x = \frac{1}{2}$.

FIGURE 8.9: Various beta-distribution PDFs.

Some basic properties of the beta distribution follow.

Theorem 8.9 (Beta distribution properties) If $X \sim \text{Beta}(m, n)$ then

$\operatorname{E}[X] = m/(m + n)$.
$\displaystyle\operatorname{var}[X] = \frac{mn}{(m + n)^2 (m + n + 1)}$.
A mode occurs at $\displaystyle x = \frac{m - 1}{m + n - 2}$ for $m, n > 1$.

Proof. Assume $X \sim \text{Beta}(m, n)$, then \[\begin{align*} \mu_r' =\operatorname{E}[X^r] &= \int_0^1 \frac{x^r x^{m - 1}(1 - x)^{n - 1}} {B(m, n)}\,dx\\ &= \frac{B(m + r, n)}{B(m, n)} \int_0^1 \frac{x^{m + r - 1}(1 - x)^{n - 1}}{B(m + r, n)}\,dx\\ &= \frac{\Gamma(m + r)\,\Gamma(n)}{\Gamma(m + r + n)} \frac{\Gamma(m + n)}{\Gamma(m)\,\Gamma(n)}\\ &= \frac{\Gamma(m + r)\,\Gamma(m + n)}{\Gamma(m + r + n)\,\Gamma(m)} \end{align*}\] Putting $r = 1$ and $r = 2$ into the above expression, and using that $\operatorname{var} = \operatorname{E}[X^2] - \operatorname{E}[X]^2$, the mean and variance are $\operatorname{E}[X] = m/(m + n)$ and $\operatorname{var}[X] = \frac{mn}{(m + n)^2(m + n + 1)}$.

Any mode $\theta$ of the distribution for which $0 < \theta < 1$ will satisfy $f'_X(\theta) = 0$. From the definition we see that for $m, n>1$ and $0 < x < 1$, \[ f'_X(x) = (m - 1)x^{m - 2} (1 - x)^{n - 1} - (n - 1)x^{m - 1}(1 - x)^{n - 1}/B(m, n) = 0 \] implying $(m - 1)(1 - x) = (n - 1)x$ which is satisfied by $x = (m - 1) / (m + n - 2)$.

The MGF of the beta distribution cannot be written in terms of standard functions. The distribution function of the beta must be evaluated numerically in general, except when $m$ and $n$ are integers, as shown in the example below.

If $X\sim \text{Beta}(m, n)$, then $X$ is defined on $[0, 1]$. Sometimes then, the beta distribution is used to model proportions, though not proportions out of a total number (when the binomial distribution is used). Instead, the beta distribution is used, for example, to model percentage cloud cover (for instance, Falls (1974)).

For a random variable $Y$ defined on a different closed interval $[a, b]$, define $Y = X(b - a) + a$.

Example 8.13 Damgaard and Irvine (2019) use the beta-distribution to model the relative areas covered by plants of different species. The beta distribution is a suitable model, since these relative areas are proportions. In addition, the authors state (p. 2747) that

…plant cover data tend to be left-skewed (J-shaped), right skewed (L-shaped) or U-shaped

which align with some of the plots in Fig. 8.9.

Example 8.14 (Beta distributions) The bulk storage tanks of a fuel retailed are filled each Monday. The retailer has observed that over many weeks the proportion of the available fuel supply sold is well modelled by a beta distribution with $m = 4$ and $n = 2$.

If $X$ denotes the proportion of the total supply sold in a given week, the mean proportion of fuel sold each week is \[ \operatorname{E}[X] = m/(m + n) = 4/6 = 2/3. \]

To find the probability that at least $90$% of the supply will sell in a given week, compute: \[\begin{align*} \Pr(X > 0.9) &= \int_{0.9}^1\frac{\Gamma(4 + 2)}{\Gamma(4)\Gamma(2)}x^3(1 - x)dx\\ &= 20\int_{0.9}^1 (x^3 - x^4))\,dx\\ &= 20(0.004)\\ &= 0.08 \end{align*}\] It is unlikely that $90$% of the supply will be sold in a given week. In R:

1 - pbeta(0.9, 
          shape1 = 4, 
          shape2 = 2)
#> [1] 0.08146

8.7 The chi-squared distribution

USE AS AN EXMAPLE??

Examples 6.8 and 6.13 produce the chi-square distribution, which is an important model in statistical theory (Theorem 12.6).

Definition 8.12 (Chi-squared distribution) A continuous random variable $X$ with probability density function \[\begin{equation} f_X(x) = \frac{x^{(\nu/2) - 1}e^{-x/2}}{2^{\nu/2}\Gamma(\nu/2)}\quad\text{for $x > 0$} \end{equation}\] is said to have a chi-square distribution with parameter $\nu > 0$. The parameter $\nu$ is called the degrees of freedom. We write $X \sim \chi^2(\nu)$.

Some plots of $\chi^2$-distributions are shown in Fig. 8.10.

$Some $\chi^2$-distributions.$

FIGURE 8.10: Some $\chi^2$-distributions.

The basic properties of the chi-square follow directly from those of the gamma distribution (Theorem 8.7).

Theorem 8.10 (Properties of chi-squared distribution) If $X\sim\chi^2(\nu)$ then

$\operatorname{E}(X) = \nu$.
$\operatorname{var}(X) = 2\nu$.
$M_X(t) = (1 - 2t)^{-\nu/2}$.

Proof. See Theorem 8.7.

The importance of the chi-square distribution is hinted at in Examples 6.8 and 6.13, which essentially prove the following theorem.

Theorem 8.11 (Chi-square distribution with 1 df) If $Z\sim N(0, 1)$ then $Z^2$ has a chi-square distribution with one degree of freedom.

Proof. Exercise; see Example 6.8.

A useful property of the chi-square distribution is that the sum of independent random variables, each with a chi-square distribution, also has a chi-square distribution. This property is given in the following theorem, which will be used later.

Theorem 8.12 (Chi-squared distribution) If $Z_1, Z_2,\dots, Z_n$ are independently and identically distributed (iid) as $N(0, 1)$, then the sum of squares $S = \sum_i Z_i^2$ has a $\chi^2(n)$ distribution.

Proof. Since $S$ is a linear combination of known distributions, using the MGF method is appropriate. Since $Z_i \sim \chi^2(1)$, from Theorem 8.10 \[ M_{Z_i}(t) = (1 - 2t)^{-1/2}. \] From Theorem 5.6 then, $S = \sum_{i = 1}^n Z_i^2$ has MGF \[\begin{align*} M_{S}(t) &= \prod_{i = 1}^n (1 - 2t)^{-1/2}\\ &= \left[(1 - 2t)^{-1/2}\right]^n = (1 - 2t)^{-n/2}, \end{align*}\] which is the MGF of $\chi^2(n)$.

Chi-square probabilities cannot in general be calculated without computers or tables.

The four R functions for working with the chi-squared distribution have the form [dpqr]chisq(df), where df${} = \nu$ refers to the degrees of freedom (see Appendix D).

Example 8.15 (Chi-squared distributions) The variable $X$ has a chi-square distribution with $12$ df. Determine the value of $X$ below which lies 90% of the distribution.

We seek a value $c$ such that $\Pr(X < c) = F_X(c) = 0.90$ where $X\sim\chi^2(12)$. In R:

qchisq(0.9, df = 12)
#> [1] 18.54935

That is, about $90$% of the distribution lies below 18.549.

8.8 The bivariate normal distribution

A specific example of a continuous multivariate distribution is the bivariate normal distribution.

Definition 8.13 (The bivariate normal distribution) If a pair of random variables $X$ and $Y$ have the joint PDF \[\begin{equation} f_{X, Y}(x, y; \mu_x, \mu_Y, \sigma^2_X, \sigma^2_Y, \rho) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1 - \rho^2}}\exp(-Q/2) \tag{8.13} \end{equation}\] where \[ Q = \frac{1}{1-\rho^2}\left[ \left(\frac{x-\mu_X}{\sigma_X}\right)^2 - 2\rho\left( \frac{x-\mu_X}{\sigma_X}\right)\left(\frac{y-\mu_Y}{\sigma_Y}\right) + \left(\frac{y-\mu_Y}{\sigma_Y}\right)^2 \right], \] then $X$ and $Y$ have a bivariate normal distribution. We write \[ (X, Y) \sim N_2(\mu_X, \mu_Y, \sigma^2_X, \sigma^2_Y, \rho ). \]

A typical graph of the bivariate normal surface above the $x$–$y$ plane is shown below. Showing $\int^\infty_{-\infty}\!\int^\infty_{-\infty}f_{X,Y}(x,y)\,dx\,dy = 1$ is not straightforward and involves writing Eq. (8.13) using polar coordinates.

Some important facts about the bivariate normal distribution are contained in the theorem below.

Theorem 8.13 (Bivariate normal distribution properties) For $(X, Y)$ with PDF given in Eq. (8.13):

The marginal distributions of $X$ and of $Y$ are $N(\mu_X, \sigma^2_X)$ and $N(\mu_Y, \sigma^2_Y)$ respectively.
The parameter $\rho$ appearing in Eq. (8.13) is the correlation coefficient between $X$ and $Y$.
the conditional distributions of $X$ given $Y = y$, and of $Y$ given $X = x$, are respectively \[\begin{align*} &N\left( \mu_X + \rho \sigma_X(y - \mu_Y)/\sigma_Y, \sigma^2_X(1 - \rho^2)\right); \\ &N\left( \mu_Y + \rho \sigma_Y(x - \mu_X)/\sigma_X, \sigma^2_Y(1 - \rho^2)\right). \end{align*}\]

Proof. Recall that the marginal PDF of $X$ is $f_X(x) = \int^\infty_{-\infty} f_{X, Y}(x, y)\,dy$. In the integral, put $u = (x - \mu_X)/\sigma_X, v = (y - \mu_Y)/\sigma_Y,\, dy = \sigma_Y\,dv$ and complete the square in the exponent on $v$: \[\begin{align*} g(x) &= \frac{1}{2\pi\sigma_X\sqrt{1 - \rho^2}\sigma_Y}\int^\infty_{-\infty}\exp\left\{ -\frac{1}{2(1 - \rho^2)}\left[ u^2 - 2\rho uv + v^2\right]\right\} \sigma_Y\,dv\\[2mm] &= \frac{1}{2\pi \sigma_X\sqrt{1 - \rho^2}}\int^\infty_{-\infty} \exp\left\{ -\frac{1}{2(1 - \rho^2)}\left[ (v - \rho u)^2 + u^2 - \rho^2u^2\right]\right\}\,dv\\[2mm] &= \frac{e^{-u^2/2}}{\sqrt{2\pi} \sigma_X} \ \underbrace{\int^\infty_{-\infty} \frac{1}{\sqrt{2\pi (1 - \rho^2)}} \exp\left\{ -\frac{1}{2(1 - \rho^2)}(v - \rho u)^2\right\}\,dv}_{=1}. \end{align*}\] Replacing $u$ by $(x - \mu_X )/\sigma_X$, we see from the PDF that $X \sim N(\mu_X, \sigma^2_X)$. Similarly for the marginal PDF of $Y$; i.e., $f_Y(y)$.

To show that $\rho$ in Eq. (8.13) is the correlation coefficient of $X$ and $Y$, recall that \[\begin{align*} \rho_{X,Y} &= \operatorname{Cov}(X,Y)/\sigma_X\sigma_Y=\operatorname{E}[(X-\mu_X)(Y - \mu_Y)]/\sigma_X\sigma_Y \\[2mm] & = \int^\infty_{-\infty}\!\int^\infty_{-\infty} \frac{(x - \mu_X)}{\sigma_X}\frac{(y - \mu_Y)}{\sigma_Y}f(x,y)\,dx\,dy\\[2mm] &= \int^\infty_{-\infty}\!\int^\infty_{-\infty} uv\frac{1}{2\pi\sqrt{1 - \rho^2} \sigma_X\sigma_Y}\exp\left\{ -\frac {1}{2(1 - \rho^2)}[u^2 - 2\rho uv + v^2]\right\} \sigma_X\sigma_Y\,du\,dv. \end{align*}\]

The exponent is \[ -\frac{[(u - \rho v)^2 + v^2 - \rho^2 v^2]}{2(1 - \rho^2)} = - \frac{1}{2} \left\{\frac{(u - \rho v)^2}{(1 - \rho^2)} + v^2\right\}. \] Then: \[\begin{align*} \rho_{X, Y} &=\int^\infty_{-\infty}\frac{v e^{-v^2/2}}{\sqrt{2\pi}}\underbrace{\int^\infty_{-\infty} \frac{u}{\sqrt{2\pi (1 - \rho^2)}}\exp\{ -(u - \rho v)^2/2(1 - \rho^2)\}\,du}_{\displaystyle{= \operatorname{E}[U]\text{ where } u \sim N(\rho v, 1 - \rho^2) = \rho v}}\,dv \\[2mm] &= \rho \int^\infty_{-\infty} \frac{v^2}{\sqrt{2\pi}}e^{-v^2/2}\,dv\\[2mm] &= \rho\quad \text{since the integral is $\operatorname{E}[V^2]$ where $V \sim N(0,1)$.} \end{align*}\]

In finding the conditional PDF of $X$ given $Y = y$, use \[ f_{X \mid Y = y}(x) = f_{X, Y}(x, y)/f_Y(y). \] Then in this ratio, the constant is \[ \frac{\sqrt{2\pi} \sigma_Y}{2\pi \sigma_X\sigma_Y \sqrt{1 - \rho^2}} = \frac{1}{\sqrt{2\pi}\sigma_X\sqrt{1 - \rho^2}}. \] The exponent is \[\begin{align*} & \frac{\exp\left\{ -\left[ \displaystyle{\frac{(x - \mu_X)^2}{\sigma^2_X}} - \displaystyle{\frac{2\rho(x - \mu_X)(y - \mu_Y)}{\sigma_X\sigma_Y}} + \displaystyle{\frac{(y - \mu_Y)^2}{\sigma^2_Y}}\right] / 2(1 - \rho^2) \right\} }{\exp\left[ -(y - \mu_Y)^2 / 2\sigma^2_Y\right]}\\[2mm] &= \exp\left\{ - \frac{1}{2(1 - \rho^2)} \left[ \frac{(x - \mu_X)^2}{\sigma^2_X} - \frac{2\rho (x - \mu_X)(y - \mu_Y)}{\sigma_X\sigma_Y} + \frac{(y - \mu_Y)^2}{\sigma^2_Y} (1 - 1 + \rho^2)\right] \right\}\\[2mm] &= \exp\left\{ - \frac{1}{2\sigma^2_X(1 - \rho^2)} \left[ (x - \mu_X)^2 - 2\rho \frac{\sigma_X}{\sigma_Y} (x - \mu_X)(y - \mu_Y) + \frac{\rho^2 \sigma^2_X}{\sigma^2_Y}(y - \mu_Y)^2\right]\right\}\\[2mm] &= \exp \left\{ - \frac{1}{2(1 - \rho^2)\sigma^2_X} \left[ x - \mu_X - \rho\frac{\sigma_X}{\sigma_Y}(y - \mu_Y)\right]^2\right\}. \end{align*}\] So the conditional distribution of $X$ given $Y = y$ is \[ N\left( \mu_X + \rho\frac{\sigma_X}{\sigma_Y}(y - \mu_Y), \sigma^2_X(1 - \rho^2)\right). \] Recall the interpretation of the conditional distribution of $X$ given $Y = y$ (Sect. 4.5) and note the shape of this density below.

Comments about Theorem 8.13:

From the first and third parts, $\operatorname{E}[X] = \mu_X$ and $\operatorname{E}[X \mid Y = y] = \mu_X + \rho \sigma_X (y - \mu_Y)/\sigma_Y$ (and similarly for $Y$). Notice that $\operatorname{E}[X \mid Y = y]$ is a linear function of $y$; i.e., if $(X, Y)$ is bivariate normal, the regression line of $Y$ on $X$ (and $X$ on $Y$) is linear.
An important result follows from the second part. If $X$ and $Y$ are uncorrelated (i.e., if $\rho = 0$) then $f_{X, Y}(x, y) = f_X(x) f_Y(y)$ and thus $X$ and $Y$ are independent. That is, if two normally distributed random variables are uncorrelated, they are also independent.

Example 8.16 (Bivariate normal: Heights) (Badiou, Marsh, and Gauchet 1988) gives data from $200$ married men and their wives from the OPCS study of heights and weights of adults in Great Britain in 1980. Histograms of the husbands’ and wives’ heights are given in Fig. 8.11 (left and centre panels); the marginal distributions are approximately normal. The scatterplot of the heights is shown in Fig. 8.11 (right panel).

From the histograms, there is reason to suspect that a bivariate normal distribution would be appropriate. Using $H$ to refer to heights (in mm) of husbands and $W$ to the heights (in mm) of wives, the sample statistics are shown below (and where the estimate of the correlation is $+0.364$):

Statistic	Husbands	Wives
Sample mean (in mm):	1732	1602
Sample std dev (in mm):	68.8	62.4

Note that $\rho$ is positive; this implies taller men marry taller women on average. Using this sample information, the bivariate normal distribution can be estimated. This $3$-dimensional density function can be difficult to plot on a two-dimensional page, but see below.

The PDF for the bivariate normal distribution for the heights of the husbands and wives could be written down in the form of (8.13) for the values of $\mu_H$, $\mu_W$, $\sigma^2_H$, $\sigma^2_W$ and $\rho$ above, but this is tedious.

Given the information, what is the probability that a randomly chosen man in the UK in 1980 who is $173\,\text{cm}$ tall had married a woman taller than himself?

The information implies that $H = 1730$ is given (remembering the data are given in millimetres). So we need the conditional distribution of $W \mid H = 1730$. Using the results above, this conditional distribution will have mean \[\begin{align*} b &= \mu_W + \rho\frac{\sigma_W}{\sigma_H}(y_H - \mu_H) \\ &= 1602 + 0.364\frac{62.4}{68.8}(1730 - 1732) \\ &= 1601.34 \end{align*}\] and variance \[ \sigma_2^2(1 - \rho^2) = 62.4^2(1 - 0.364^2) = 377.85. \] In summary, $W \mid (H = 1730) \sim N(1601.34, 3377.85)$. Note that this conditional distribution has a univariate normal distribution, and so probabilities such as $W > 1730$ are easily determined. Then, \[\begin{align*} \Pr(W > 1730 \mid H = 1730) &= \Pr\left( Z > \frac{1730 - 1601.34}{\sqrt{3377.85}}\right) \\ &= \Pr( Z > 2.2137)\\ &= 0.013. \end{align*}\] Approximately $1.3$% of males $173\,\text{cm}$ tall had married women taller than themselves in the UK in 1980.

FIGURE 8.11: Plots of the heights data.

#> INSIDE

8.9 Other notable continuous distributions

The distributions discussed in this chapter are standard and commonly-used continuous distributions. The continuous uniform distribution is used to model complete randomness when all values in an interval are equally likely. Generally, computers can only generate continuous uniform (pseudo-) random numbers; random numbers from other distributions must be produced by transforming (pseudo-) random number produced from a continuous uniform distribution.

The normal distribution is the most well-known continuous distribution often kown as the ‘bell-shaped curve’) It is defined over the entire real line. Many natural processes, and many measurement errors, have a normal distribution. Importantly, it is often used in statistics, due to its importance in the Central Limit Theorem*.

Many natural process are onky defined for non-negative values, rather than over the entire real line (such as heights, times and distances). The exponential and gamma distributions are defined over a suitable region. The exponential distribution is right-skewed, and often used for modelling waiting times between Poisson arrivals, and the time between memoryless events. The gamma distribution is more flexible than the exponential distribution, and is often used ot model insurance claims, rainfall, and waiting times. The exponential distribution is a special case of the gamma distribution.

The beta distribution is used for modelling proportions (or probabilities) that are not counts of a fied number (when a binomial distribution is appropriate), such as cloud cover, fractional resource consumption, etc.

Countless other useful discrete distributions exist; we mentioned a few. The Weibull distribution is defined on $[0, \infty)$ (notice this includes zero) and is used to model lifetimes and failures, and is used in reliability analysis and survival analysis.

The von Mises distribution is used to model angles, and is defined on $[-\pi, \pi)$; it is sometimes called the ‘circular normal distribution’. It is used to model wind directions and animal movements.

The Cauchy distribution is defined on the entire real-line, but has heavier tails than the normal distribution. It is used to model the ratio of two independent normal variables.

The lognormal distribution is the a right-skewed distribution defined on the positive real numbers. If $\log X\sim N(\mu, \sigma^2)$, then $X$ has a lognormal distribution. It is used to model incomes and stock prices (using the Black–Scholes equation).

The arcsine distribution is a special case of the beta distribution and hence defined on $(0, 1)$, with probability concentrated near the limits of the range at $x = 0$ and $x = 1$.

Numerous other continuous distributions emerge when studying sampling (Chap. 12), and are usually related to the normal distribution. The $t$-distribution is similar to a normal distribution (and defined on the entire real line), but with heavier tals. The $\chi^2$ distribution, defined on $(0, \infty)$, is related to the sums of independent squared normal distributions. The $F$-distribution, defined on $(0, \infty)$ is related to the ratio of independent $\chi^2$ distributions.

8.10 Simulation

8.10.1 Examples

As with discrete distributions (Sect. 7.11), simulation can be used with continuous distributions. Again, in R, random numbers from a specific distribution use functions that start with the letter r; for example, to generate random numbers from a normal distribution, use rnorm():

rnorm(3, # Generate three random numbers...
      mean = 4,     # ... with mean = 4
      sd = 2)       # ... with sd = 2
#> [1] 1.748524 4.082457 2.624341

Suppose we are seeking women over $173\,\text{cm}$ tall for a study; what percentage of women are over $173\,\text{cm}$ tall? Assume that adult females have heights modelled by a normal distribution with mean $163\,\text{cm}$ and a standard deviation of $5\,\text{cm}$ (Australian Bureau of Statistics 1995). The percentage taller than $173\,\text{cm}$ is:

1 - pnorm(173, mean = 163, sd = 5)
#> [1] 0.02275013

Simulations can also be used, which allows us to consider more complex situations. For this simple question initially (Fig. 8.12, left panel):

htW <- rnorm(1000,  # One thousand women
             mean = 163,  # Mean ht
             sd = 5)      # Sd of height
pcTallWomen <- sum( htW > 173 )/1000 * 100

cat("Percentage 'tall' women:", pcTallWomen, '%\n')
#> Percentage 'tall' women: 2.3 %

Now consider a more complex situation. Suppose adult males have heights with a mean of $175\,\text{cm}$ with standard deviation of $7\,\text{cm}$, and constitute $44$% of the population. Now, we are seeking women or men over $173\,\text{cm}$ tall (Fig. 8.12, right panel):

personSex <- rbinom(1000,  # Select sex of each person
                    1,
                    0.44) # So 0 = Female; 1 = Male

htP <- rnorm(1000,  # change ht parameters according to sex
             mean = ifelse(personSex == 1,
                           175,
                           163),
             sd = ifelse(personSex == 1,
                         7, 
                         5))

pcTallPeople <- sum( htP > 173 )/1000 * 100
pcFemales <- sum( personSex == 0 ) / 1000 * 100

cat("Percentage 'tall' people:", pcTallPeople, '%\n\n')
#> Percentage 'tall' people: 28.1 %
cat("Percentage females:", pcFemales, '%\n\n')
#> Percentage females: 56.8 %
cat("Percentage of 'tall' people that are female:", 
    round( sum(htP > 173 & personSex == 0) / sum(htP > 173) * 100, 1),
    '%\n\n')
#> Percentage of 'tall' people that are female: 4.3 %
cat("Variance of heights: ", var(htP), "\n")
#> Variance of heights:  74.2899

FIGURE 8.12: The distribution of heights of women (left panel) and men and women combined (right panel).

8.10.2 Relationships between distributions

As seen before (Sect. 3.7), many statistical distributions are generated from existing distributions. These relationships can be shown using computer simulation.

As an example, the gamma distribution is related to the exponential distribution. If a random variable $Y$ has an exponential distribution with rate parameter $\lambda$, then the distribution of \[ X = \sum_{i = 1}^k Y \] has a gamma distribution, with shape parameter $\alpha = k$ and rate parameter $\lambda$. This can be shown using simulation (Fig. 8.13):

set.seed(67433)

### An exponential variate with  rate = 2 
Y <- rexp(n = 5000,
          rate = 2)
X <- matrix( data = Y,
             ncol = 5, # Place 5 values in each of 1000 rows
             nrow = 1000)
X <- rowSums(X) # Sum five values in each row

### Set up for two plots, side-by-side
par(mfrow = c(1, 2) )

### Plot two histograms: Y and X
###  truehist  is part of the MASS package,which must be loaded first, using: 
###  library(MASS)
truehist(Y,
         las = 1,
         main = "Exponential density",
         xlab = expression(italic(Y) ),
         ylab = "Density")
truehist(X,
         las = 1,
         main = "Gamma density",
         xlab = expression(italic(X) ),
         ylab = "Density")

### Now plot the theoretical gamma distribution
# Use these values of X:
x_Plot <- seq(0, 10,
              length = 100)
# Add lines to the histogram of chi-square random values
lines( dgamma(x_Plot, shape = 5, rate = 2) ~ x_Plot,
       type = "l",
       lwd = 3)
# dgamma()  is the density function for a gamma distribution

$Left: histogram of $5000$ simulated values from an exponential distribution. Right: histogram of the sum of five independent exponential distributions. The solid lines on the right panel is the theoretical distribution of a gamma distribution with a shape parameter of\ $5$.$

FIGURE 8.13: Left: histogram of $5000$ simulated values from an exponential distribution. Right: histogram of the sum of five independent exponential distributions. The solid lines on the right panel is the theoretical distribution of a gamma distribution with a shape parameter of $5$.

8.11 Exercises

Selected answers appear in Sect. E.8.

Exercise 8.1 Write the beta distribution parameters $m$ and $n$ in terms of the mean and variance.

Exercise 8.2 In a study modelling pedestrian road-crossing behaviour (Shaaban and Abdel-Warith 2017), the uniform distribution is used in the simulation to model the vehicle speeds. The speeds have a minimum value of $30$ km.h^$-1$, and a maximum value of $72$ km.h^$-1$.

Using this model, compute the mean and standard deviation of vehicle speeds.
Compute and plot the PDF and distribution function.
If $X$ is the vehicle speed, compute $\Pr(X > 60)$, where $60$ km.h^$-1$ is the posted speed limit.
Compute $\Pr(X > 65\mid X > 60)$.

Exercise 8.3 In a study modelling pedestrian road-crossing behaviour (Shaaban and Abdel-Warith 2017), the normal distribution is used in the simulation to model the vehicle speeds. The normal model has a mean of $48$ km/h, with a standard deviation² of $8.8$ km.h^$-1$.

The article ignores all vehicles travelling slower than $30$ km.h^$-1$ and faster than $72$ km.h^$-1$. What proportion of vehicles are excluded using this criterion?
Write the PDF for this truncated normal distribution.
Plot the PDF and df for this truncated normal distribution.
Compute the mean and variance for this truncated normal distribution.
Suppose a vehicle is caught speeding (i.e., exceeding $60$ km.h^$-1$); what is the probability that the vehicle is exceeding the speed limit by less than $5$ km.h^$-1$?

Exercise 8.4 A study of the service life of concrete in various conditions (Liu and Shi 2012) modelled the diffusion coefficients using a gamma distribution, with $\alpha = 27.05$ and $\beta = 1.42$ (unitless).

Plot the PDF and df.
Determine the mean and standard deviation of the chloride diffusion coefficients that were used in the simulation.
If $C$ represents the chloride diffusion coefficients, compute $\Pr(C > 30\mid C < 50)$.

Exercise 8.5 A study of the service life of concrete in various conditions (Liu and Shi 2012) used normal distributions to model the surface chloride concentrations. In one model, the mean was set as $\mu = 2$ kg.m^$-3$ and the standard deviation as $\sigma = 0.2$ kg.m^$-3$.

Plot the PDF and df.
Let $Y$ be the surface chloride concentration. Compute $\Pr(Y > 2.4)$.
$80$% of the time, surface chloride concentrations are below what value?
$15$% of the time, surface chloride concentrations exceed what value?

Exercise 8.6 An alternative to the beta distribution with a closed form is the Kumaraswamy distribution, with PDF \[ p_X(x) = \begin{cases} ab x^{a - 1} (1 - x^a)^{b - 1} & \text{for $0 < x < 1$};\\ 0 & \text{elsewhere} \end{cases} \] for $a > 0$ and $b > 0$. We write $X\sim \text{Kumaraswamy}(a, b)$.

Plot some densities for the Kumaraswamy distribution showing its different shapes.
Show that the df is $F_X(x) = 1 - (1 - x^a)^b$ for $0 < x < 1$. Plot the distribution functions corresponding to the PDFs in Part 1.
If $X\sim \text{Kumaraswamy}(a, 1)$, then show that $X\sim \text{Beta}(a, 1)$.
If $X\sim \text{Kumaraswamy}(1, 1)$, then show that $X\sim \text{Unif}(0, 1)$.
If $X\sim \text{Kumaraswamy}(a, 1)$, then show that $(1 - X) \sim \text{Kumaraswamy}(1, a)$.

Exercise 8.7 Show that the gamma distribution has a constant coefficient of variation, where the coefficient of variation is the standard deviation divided by the mean.

Exercise 8.8 In a study modelling waiting times at a hospital (Khadem et al. 2008), patients are classified into one of three categories:

Red: Critically ill or injured patients.
Yellow: Moderately ill or injured patients.
Green: Minimally injured or uninjured patients.

For ‘Green’ patients, the service time $S$ was modelled as $S = 4.5 + 11V$, where $V \sim \text{Beta}(0.287, 0.926)$.

Find the mean and standard deviation of the service times $S$.
Produce well-labelled plots of the PDF and distribution function of $V$, showing important features.

Exercise 8.9 In a study modelling waiting times at a hospital (Khadem et al. 2008), patients are classified into one of three categories:

Red: Critically ill or injured patients.
Yellow: Moderately ill or injured patients.
Green: Minimally injured or uninjured patients.

The time (in minutes) spent in the reception are for ‘Yellow’ patients, say $T$, is modelled as $T = 0.5 + W$, where $W\sim \text{Exp}(\beta = 16.5)$.

Find the mean and standard deviation of the waiting times $T$.
Plot the PDF and distribution function of $W$.
Determine $\Pr(T > 1)$.

Exercise 8.10 In a study modelling waiting times at a hospital (Khadem et al. 2008), patients are classified into one of three categories:

Red: Critically ill or injured patients.
Yellow: Moderately ill or injured patients.
Green: Minimally injured or uninjured patients.

The time (in minutes) spent in the reception are for ‘Green’ patients, say $T$, is modelled as a normal with mean $\mu = 45.4\,\text{mins}$ and standard deviation $\sigma = 23.4\,\text{mins}$.

Find the mean and standard deviation of the waiting times $T$.
Plot the PDF and distribution function of $T$.
What proportion of ‘Green’ patients wait longer than an hour?
How long to the slowest $10$% of patients need to wait?

Exercise 8.11 In a study of rainfall (Watterson and Dix 2003), rainfall on wet days in SE Australia (37^$\circ$S; 146^$\circ$E) was simulated using a gamma distribution with $\alpha = 0.62$ and $\beta = 7.1$mm.

Rainfall below $0.0017\,\text{mm}$ cannot be recorded by the equipment used. What proportion of days does this represent?
Plot the PDF and df.
What is the probability that more than $3\,\text{mm}$ falls on a wet day?
The proportion of wet days is $0.43$. What proportion of all days receive more than $3\,\text{mm}$?

Exercise 8.12 In a study of rainfall disaggregation (Connolly, Schirmer, and Dunn 1998) (extracting small-scale rainfall features from large-scale measurements), the duration (in fractions of day) of non-overlapping rainfall events per day at Katherine was modelled using a Gamma distribution with $\alpha = 2$, and $\beta = 0.04$ in summer and $\beta = 0.03$ in winter.

Denote the duration of rainfall events, in fractions of a day, be denoted in summer as $S$, and in winter as $W$.

Plot the rainfall event duration distributions, and compare summer and winter.
What is the probability of a rainfall event lasting more than $6\,\text{h}$ in winter?
What is the probability of a rainfall event lasting more than $6\,\text{h}$ in summer?
Describe what is meant by the statement $\Pr(S > 3/24 \mid S > 1/24)$, and compute the probability.
Describe what is meant by the statement $\Pr(W < 2/24 \mid W > 1/24)$, and compute the probability.

Exercise 8.13 In a study of rainfall disaggregation (Connolly, Schirmer, and Dunn 1998) (extracting small-scale rainfall features from large-scale measurements), the starting time of the first rainfall event at Katherine each day (scaled from 0 to 1) was modelled using a beta distribution with parameters $(1.16, 1.50)$ in summer and $(0.44, 0.56)$ in winter.

Denote the number of rainfall events in summer as $S$, and in winter as $W$.

Plot the two starting-time distributions, and compare summer and winter.
Compute the mean and standard deviation of starting times for both seasons.
What is the probability that the first rain event in winter is after 6am?
What is the probability that the first rain event in summer is after 6am?
Describe what is meant by the statement $\Pr(S > 3 \mid S > 1)$, and compute the probability.
Describe what is meant by the statement $\Pr(W > 2 \mid W > 1)$, and compute the probability.

Exercise 8.14 A study of the impact of diarrhoea (Schmidt, Genser, and Chalabi 2009) used a gamma distribution to model the duration of symptoms. In Guatemala, duration (in days) was modelled using a gamma distribution with $\alpha = 1.11$ and $\beta = 3.39$. Let $X$ refer to the duration of symptoms.

Compute the mean and standard deviation of the duration.
The value of $\alpha$ is close to one. Plot the PDF of the gamma distribution and the near-equivalent exponential distribution, and comment.
Compute $\Pr(X > 4)$ using both the gamma and exponential distributions, and comment.
The patients in the $5$% with symptoms the longest had symptoms for how long? Again, compare both models, and comment.

Exercise 8.15 In a study of office occupancy at a university (Luo et al. 2017), the leaving-times of professors were modelled as a normal distribution, with a mean leaving time of 6pm, with a standard deviation of $1.5\,\text{h}$.

Using this model, what proportion of professors leave before 5pm?
Suppose a professor is still at work at 5pm; what is the probability that the professor leaves after 7pm?
The latest-leaving $15$% of professors leave after what time?

Exercise 8.16 The percentage of clay content in soil has been modelled using a beta distribution. In one study (Haskett, Pachepsky, and Acock 1995), two counties in Iowa were modelled with beta-distributions: County A with parameters $m = 11.52$ and $n = 4.75$, and County B with parameters $m = 3.85$ and $n = 3.65$.

Plot the two distributions, and comment (in context).
Compute the mean and standard deviation of each county.
What percentage of soil samples exceed $50$% clay in the two counties? Comment.

Exercise 8.17 A study of water uptake in plants in grasssland and savanna ecosystems (Nippert and Holdo 2015) used a beta distribution to model the distribution of root depth $D$. The MRD (maximum root density) was $70\,\text{cm}$, so the beta distribution used was defined over the interval $[0, 70]$. For one simulation, the beta-distribution parameters were $m = 1$ and $n = 5$.

Plot the PDF, and explain what it means in this context.
Determine the probability that the root depth was deeper than $50\,\text{cm}$.
Determine the root depth of the plants with the deepest $20$% of roots.

Exercise 8.18 Suppose that the measured voltage in a certain electrical circuit has a normal distribution with mean $120\,\text{V}$ and standard deviation $2\,\text{V}$.

What is the probability that a measurement will be between $116$ and $118\,\text{V}$?
If five independent measurements of the voltage are made, what is the probability that three of the five measurements will be between $116$ and $118\,\text{V}$?

Exercise 8.19 A firm buys $500$ identical components, whose failure times are independent and exponentially distributed with mean $100\,\text{h}$.

Determine the probability that one component will survive at least $150\,\text{h}$.
What is the probability that at least $125$ components will survive at least $150\,\text{h}$?

Exercise 8.20 A headway is the time gap (front-bumper to front-bumper) separating consecutive motor vehicles in a lane of road traffic. Suppose headways $X$ (in seconds) on a section of a traffic lane are described by a shifted gamma distribution, with PDF \[ f_X(x) = \frac{1}{\beta^\alpha \Gamma(\alpha)} (x - \Delta)^{\alpha - 1} \exp\left( -(x - \Delta)/\beta\right) \quad\text{for $x > \Delta$}, \] where $\alpha = 2.5$, $\beta = 0.8$ and $\Delta = 1.2$.

Determine the theoretical mean and variance of this distribution.
Simulate $1000$ headways using R, and plot a histogram of the data. Comment.
Using the simulation results, estimate the probability that a headway is within two standard deviations of the mean.
Confirm that Tchebyshev’s inequality holds for the above probability.

Exercise 8.21 Another commonly-used distribution is the Weibull distribution: \[ f_Y(y) = \frac{k}{\lambda} \left( \frac{x}{\lambda} \right)^{k - 1} \exp( -(x/\lambda)^k)\quad\text{for $x > 0$}, \] where $\lambda > 0$ is called a scale parameter, and $k > 0$ is called a shape parameter.

Produce some sketches in R to show the variety on the shapes of the density function.
Derive the mean and variance of the Weibull distribution.
Derive the cdf for the Weibull distribution.

Exercise 8.22 Raschke (2011) modelled the humidity at the Haarweg Wageningen (Netherlands) weather station in May 2007 and 2008 using beta distributions. In May 2007, the relative humidity was modelled with a $\text{Beta}(6.356, 1.970)$ distribution; in May 2008, with the $\text{Beta}(2.803, 1.456)$ distribution.

With the given models, determine the mean and variance of the humidity for both months.
On the same graph, plot both distributions, and comment.
For May 2008, compute $\Pr(X > 60)$, where $X$ is the relative humidity, based on the model.
For May 2008, compute $\Pr(X > 60 \mid X > 50)$, where $X$ is the relative humidity, based on the model.
For May 2008, $80$% of days have a relative humidity less than what value, based on the model?

Exercise 8.23 Based on Pfizer Australia (2008), the mean height $Y$ of Australian girls aged between $2$ and $5$ years of age based on their age $X$ is approximately given by \[ \hat{y} = 7.5x + 70. \] Similarly, the standard deviation $s$ of the heights at age $x$ is approximately given by \[ s = 0.4575x + 1.515. \]

Suppose that, Australia-wide, day-care facilities have $32$% of children aged $2$, $33$% of children aged $3$, $25$% of children aged $4$, and the remainder aged $5$. Use R to simulate the distribution of the heights of children at day-care facilities.
Using the normal distributions implied, what percentage of children are taller than $100\,\text{cm}$ for each age?
Use the simulation to determine the proportion of children taller than $100\,\text{cm}$.
Use the simulation to determine the mean and variance of the height of the children at the day-care facility.
Use the simulation to determine the heights for the tallest $15$% of children at the facility.

Exercise 8.24 Show that the variance of the normal, Poisson and gamma distributions can all be written in the form $\operatorname{var}[X] = \phi \operatorname{E}[X]^p$ for $p = 0, 1$ and $2$ respectively (Dunn and Smyth 2018). These are all speical cases of the Tweedie distributions.

Exercise 8.25 Chia and Hutchinson (1991) used the beta distribution to model cloud duration in Darwin, defined as (p. 195)

…the fraction of observable daylight hours not receiving bright sunshine.

In January, the fitted parameters were $m = 0.859$ and $n = 0.768$; in July, the fitted parameters were $m = 0.110$ and $b = 1.486$.

Plot both PDFs on the same graph, and comment on what this tells you about cloud cover in Darwin
Writing $C$ for the cloud duration, compute $\Pr(C > 0.5)$ for both January and July.

Exercise 8.26 Suppose a spinning wheel has a circumference of one metre, and at some point on the outer edge of the wheel is a point of interest (Fig. 8.14).

When the wheel is spun hard, what is the probability that the point finishes between locations $A$ and $B$ when it stops?

FIGURE 8.14: A wheel to spin.

Exercise 8.27 Suppose a random variable $Y$ has a beta distribution (Sect. 8.6), which can be used for modelling proportions. Proportions are related to odds, where \[ \text{odds} = \frac{p}{1 - p} \] for $0 \le p < 1$.

Using simulation, find and plot the density function for the distribution of the odds for $p = 0.5$.
Using simulation, find and plot the density function for the distribution of the odds for $p = 0.25$.

(This is the beta prime distribution.)

7 Standard discrete distributions

9 Mixed distributions

Event (binomial)	Prob (binomial)	Event (normal)	Prob (normal)
\(\Pr(X \lt 10)\)	0.0213	\(\Pr(Y \lt 9.5)\)	0.0368
\(\Pr(X \le 15)\)	0.3877	\(\Pr(Y \lt 15.5)\)	0.3771
\(\Pr(X \gt 17)\)	0.4006	\(\Pr(Y \gt 17.5)\)	0.4115
\(\Pr(X \ge 21)\)	0.1519	\(\Pr(Y \gt 20.5)\)	0.1518
\(\Pr(15 \lt X \le 17)\)	0.2117	\(\Pr(15.5 \lt Y \lt 17.5)\)	0.2113