6 Transformations of random variables

On completion of this chapter, you should be able to:

derive the distribution of a transformed variable, given the distribution of the original variable, using the distribution function method, the change of variable method, and the moment-generating function method as appropriate.
find the joint distribution of two transformed variables in a bivariate situation.

6.1 Introduction

In this chapter, we consider the distribution of a random variable $Y = u(X)$, given a random variable $X$ with known distribution, and a function $u(\cdot)$. Among several available techniques, three are considered:

the change of variable method (Sect. 6.2);
the distribution function method for continuous random variable only (Sect. 6.3);
the moment-generating function method (Sect. 6.4).

An important concept in this context is a one-to-one transformation.

Definition 6.1 (One-to-one transformation) Given random variables $X$ and $Y$ with range spaces $\mathcal{R}_X$ and $\mathcal{R}_Y$ respectively, the function $u(\cdot)$ is a one-to-one transformation (or mapping) if, for each $y\in \mathcal{R}_Y$, there corresponds exactly one $x\in \mathcal{R}_X$.

When $Y = u(X)$ is a one-to-one transformation, the inverse function is uniquely defined; that is, $X$ can be written uniquely in terms of $Y$. This is important when considering the distribution of $Y$ when the distribution of $X$ is known.

Example 6.1 The transformation $Y = (X - 1)^2$ is not a one-to-one transformation for $\mathcal{R}_X = \mathbb{R}$; that is, if $X$ is defined on $(-\infty, +\infty)$. For example, the inverse transformation is $X = 1 \pm \sqrt{Y}$, and two values of $X$ exists for any given value of $Y > 0$ (Fig. 6.1, left panel).

However, if the random variable $X$ is only defined for $X > 2$, then the transformation is a one-to-one function (Fig. 6.1, right panel).

FIGURE 6.1: Two transformations: a non-one-to-one transformation (left panel), and a one-to-one transformation (right panel).

6.2 The change of variable method

The method is relatively straightforward for one-to-one transformations (such as $Y = 1 - X$ or $Y = \exp(X)$). Considerable care needs to be exercised if the transformation is not one-to-one; examples are given below. The discrete and continuous cases are considered separately.

6.2.1 Discrete random variables

Let $X$ be a discrete random variable with probability function $p_X(x)$, and let $\mathcal{R}_X$ denote the set of discrete points for which $p_X(x) > 0$. Let $Y = u(X)$ define a one-to-one transformation that maps $\mathcal{R}_X$ onto $\mathcal{R}_Y$, the set of discrete points for which the transformed variable $Y$ has a non-zero probability. If we solve $Y = u(X)$ for $X$ in terms of $Y$, say $X = w(Y) = u^{-1}(Y)$, then for each $y \in \mathcal{R}_Y$, we have $x = w(y)\in \mathcal{R}_X$.

Example 6.2 (One-to-one transformation) Suppose \[ p_X(x) = \begin{cases} x/15 & \text{for $x = 1, 2, 3, 4, 5$};\\ 0 & \text{elsewhere}. \end{cases} \] To find the probability function of $Y$ where $Y = 2X + 1$ (i.e., $u(x) = 2x + 1$), first see that $\mathcal{R}_X = \{1, 2, 3, 4, 5\}$. Hence $\mathcal{R}_Y = \{3, 5, 7, 9, 11\}$ and the mapping $y = 2x + 1 = u(x)$ is one-to-one. Also, $w(y) = u^{-1}(y) = (y - 1)/2$. Hence, \[ \Pr(Y = y) = \Pr(2X + 1 = y) = \Pr\left(X = \frac{y - 1}{2}\right) = \left(\frac{y - 1}{2}\right) \times\frac{1}{15} = \frac{y - 1}{30}. \] So the probability function of $Y$ is \[ \Pr(Y = y) = \begin{cases} (y - 1)/30 & \text{for $y = 3, 5, 7, 9, 11$};\\ 0 & \text{elsewhere}. \end{cases} \] (Note: The probabilities in this probability function add to $1$.)

The above procedure when $Y = u(X)$ is a one-to-one mapping can be stated generally as \[\begin{align*} \Pr(Y = y) &= \Pr\big(u(X) = y\big)\\ &= \Pr\big(X = u^{-1} (y)\big)\\ &= p_X\big(u^{-1}(y)\big), \quad\text{for $y\in \mathcal{R}_Y$}. \end{align*}\]

Example 6.3 (Transformation (1:1)) Let $X$ have a binomial distribution with probability function \[ p_X(x) = \begin{cases} \binom{3}{x}0.2^x (0.8)^{3 - x} & \text{for $x = 0, 1, 2, 3$};\\ 0 & \text{otherwise}. \end{cases} \] To find the probability function of $Y = X^2$, first note that $Y = X^2$ is not a one-to-one transformation in general, but is over the range space of $X$ (i.e., for $x = 0, 1, 2, 3$).

The transformation $y = u(x) = x^2$, $\mathcal{R}_X = \{ x \mid x = 0, 1, 2, 3 \}$ maps onto $\mathcal{R}_Y = \{y \mid y = 0, 1, 4, 9\}$. The inverse function is $x = w(y) = \sqrt{y}$, and hence the probability function of $Y$ is \[ p_Y(y) = p_X(\sqrt{y}) = \begin{cases} \binom{3}{\sqrt{y}}0.2^{\sqrt{y}} (0.8)^{3 - \sqrt{y}} & \text{for $y = 0, 1, 4, 9$};\\ 0 & \text{otherwise}. \end{cases} \]

When the transformation $u(\cdot)$ is not 1:1, more care is needed.

Example 6.4 (Transformation not 1:1) Suppose $\Pr(X = x)$ is defined as in Example 6.2, and define $Y = |X - 3|$. Since $\mathcal{R}_Y = \{0, 1, 2\}$ the mapping is not one-to-one: the event $Y = 0$ occurs if $X = 3$, the event $Y = 1$ occurs if $X = 2$ or $X = 4$, and the event $Y = 2$ occurs if $X = 1$ or $X = 5$. Hence, $\mathcal{R}_Y \{ 0, 1, 2\}$.

To find the probability distribution of $Y$: \[\begin{align*} \Pr(Y = 0) &= \Pr(X = 3) = 3/15 = \frac{1}{5};\\ \Pr(Y = 1) &= \Pr(X = 2 \text{ or } 4) = \frac{2}{15} + \frac{4}{15} = \frac{2}{5};\\ \Pr(Y = 2) &= \Pr(X = 1 \text{ or } 5) = \frac{1}{15} + \frac{5}{15} = \frac{2}{5}. \end{align*}\] The probability function of $Y$ is \[ p_Y(y) = \begin{cases} 1/5 & \text{for $y = 0$};\\ 2/5 & \text{for $y = 1$};\\ 2/5 & \text{for $y = 2$};\\ 0 & \text{elsewhere}. \end{cases} \]

6.2.2 Continuous random variables

Theorem 6.1 (Change of variable (continuous rv)) If $X$ has PDF $f_X(x)$ for $x\in \mathcal{R}_X$ and $u(\cdot)$ is a one-to-one function for $x\in \mathcal{R}_X$, then the random variable $Y = u(X)$ has PDF \[ f_Y(y) = f_X(x) \left|\frac{dx}{dy}\right| \] where the right-hand side is expressed as a function of $y$. The term $\left|dx/dy\right|$ is called the Jacobian of the transformation.

Proof. Let the inverse function be $X = w(Y)$ so that $w(y) = u^{-1}(x)$.

Case 1: $y = u(x)$ is a strictly increasing function (Fig. 6.2, left panel). If $a < y < b$ then $w(a) < x < w(b)$ and $\Pr(a < Y < b) = \Pr\big(w(a) < X <w(b)\big)$, so \[ {\int^b_a f_Y(y)\,dy =\int^{w(b)}_{w(a)}f_X(x)\,dx =\int^b_af\big( w(y)\big)\frac{dx}{dy}\,\,dy}. \] Therefore, $\displaystyle {f_Y(y) = f_X\big( w(y) \big)\frac{dx}{dy}}$, where $w(y) = u^{-1}(x)$.

FIGURE 6.2: A strictly increasing transformation function (left panel) and strictly decreasing function (right panel).

Case 2: $y = u(x)$ is a strictly decreasing function of $x$ (Fig. 6.2, right panel). If $a < y < b$ then $w(b) < x < w(a)$ and $\Pr(a < Y < b) = \Pr\big(w(b) < X < w(a)\big)$, so that \[\begin{align*} \int^b_a f_Y(y)\,dy & = \int^{w(a)}_{w(b)}f_X(x)\,dx\\ & = \int^a_bf_X(x)\frac{dx}{dy}\,\,dy\\ & = - \int ^b_a f_X(x)\frac{dx}{dy}\,dy. \end{align*}\] Therefore $f_Y(y) = -f_X\left( w(y) \right)\displaystyle{\frac{dx}{dy}}$. But $dx/dy$ is negative in the case of a decreasing function, so in general \[ f_Y(y) = f_X(x)\left|\frac{dx}{dy} \right|. \]

Example 6.5 (Transformation) Let the PDF of $X$ be given by \[ f_X(x) = \begin{cases} 1 & \text{for $0 < x < 1$};\\ 0 & \text{elsewhere}. \end{cases} \] Consider the transformation $Y = u(X) = -2\log X$ (where $\log$ refers to logarithms to base $e$, or natural logarithms). The transformation is one-to-one, and the inverse transformation is \[ X = \exp( -Y/2) = u^{-1}(Y) = w(Y). \] The space $\mathcal{R}_X = \{x \mid 0 < x < 1\}$ is mapped to $\mathcal{R}_Y = \{y \mid 0 < y < \infty\}$. Then, \[ w'(y) = \frac{d}{dy} \exp(-y/2) = -\frac{1}{2}\exp(-y/2), \] and so the Jacobian of the transformation $|w'(y)| = \exp(-y/2)/2$. The PDF of $Y = -2\log X$ is \[\begin{align*} f_Y(y) &= f_X\big(w(y)\big) |w'(y)| \\ &= f_X\big(\exp(-y/2)\big) \exp(-y/2)/2 \\ &= \frac{1}{2}\exp(-y/2)\quad\text{for $y > 0$}. \end{align*}\] That is, $Y$ has an exponential distribution with $\beta = 2$: $Y \sim \text{Exp}(2)$ (Def. 8.8).

Example 6.6 (Square root transformation) Consider the random variable $X$ with PDF $f_X(x) = e^{-x}$ for $x \geq 0$. To find the PDF of $Y = \sqrt{X}$, first see that $y = \sqrt{x}$ is a strictly increasing function for $x \geq 0$ (Fig. 6.3).

The inverse relation is $X = Y^2$, and $dx/dy = |2y| = 2y$ for $x \ge 0$. The PDF of $Y$ is \[ f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| = 2y e^{-y^2}\quad \text{for $y\geq0$}. \]

FIGURE 6.3: The square-root transformation (left panel); the PDF of $X$ (centre panel) and the PDF of $Y$ (right panel).

Example 6.7 (Tan transformation) Let random variable $X$ be uniformly distributed on $[-\pi/2, \pi/2]$. Suppose we seek the distribution of $Y = \tan X$ (Fig. 6.4).

For the mapping $Y = \tan X$, we see that $\mathcal{R}_Y = \{ y\mid -\infty <y<\infty\}$. The mapping is one-to-one, and so $X = \tan^{-1}Y$, and $dx/dy = 1/(1 + y^2)$. Hence \[ f_Y(y) = f_X(x)\left|\frac{dx}{dy}\right| = \frac{1}{\pi(1 + y^2)}. \] This is the Cauchy distribution.

FIGURE 6.4: The tan transformation (left panel); the PDF of $X$ (centre panel) and the PDF of $Y$ (right panel).

A case where the function $u(\cdot)$ is not a one-to-one transformation is considered using an example, using a modification of Theorem 6.1.

Example 6.8 (Transformation (not 1:1)) Given a random variable $Z$ which follows a $N(0, 1)$ distribution, suppose we seek the probability distribution of $Y = \frac{1}{2} Z^2$.

The relationship $Y = u(Z) = \frac{1}{2}z^2$ is not strictly increasing or decreasing in $(-\infty, \infty )$ so Theorem 6.1 cannot be applied directly. Instead, subdivide the range of $Z$ and $Y$ so that in each portion the relationship is monotonic. Then: \[ f_Z(z) = \frac{1}{\sqrt{2\pi}}\,e^{-\frac{1}{2} z^2}\quad\text{for $-\infty < z < \infty$}. \] The inverse relation, $Z = u^{-1}(Y)$ is $Z = \pm \sqrt{2Y}$. For a given value of $Y$, two values of $Z$ are possible. However, in the range $-\infty < z < 0$, $Y$ and $Z$ have a monotonic relationship. Similarly, for $0 < z <\infty$, $Y$ and $Z$ have a monotonic relationship. Thus (see Fig. 6.5), \[ \Pr(a < Y <b) = \Pr(-\sqrt{2b} < Z < -\sqrt{2a}\,) + \Pr(\sqrt{2a} < Z < \sqrt{2b}\,). \] The two terms on the right are equal because the distribution of $Z$ is symmetrical about $z = 0$. Thus $\Pr(a < Y < b) = 2\Pr(\sqrt{2a} < Z < \sqrt{2b}\,)$, and \[\begin{align*} f_Y(y) &= 2f_Z(z)\left| \frac{dz}{dy}\right|\\ &= 2\frac{1}{\sqrt{2\pi}}e^{-y}\frac{1}{\sqrt{2y}}; \end{align*}\] that is, \[ f_Y(y) = e^{-y}y^{-\frac{1}{2}} / \sqrt{\pi}\quad\text{for $0 < y < \infty$}. \] This PDF is a gamma distribution with parameters $\alpha = 1/2$ and $\beta = 1$. It follows that if $X\sim N(\mu,\sigma^2)$, then the PDF of $Y = \frac{1}{2} (X - \mu )^2 / \sigma^2$ is $\text{Gamma}(\alpha = 1/2,\beta = 1)$ since then $(X - \mu)/\sigma$ is distributed as $N(0, 1)$.

FIGURE 6.5: A transformation not 1:1.

Note that the probability can only be doubled as in Example 6.8 if both $Y = u(Z)$ and the PDF of $Z$ are symmetrical about the same point.

6.2.3 Discrete bivariate case????

The bivariate case is similar to the univariate case. Consider a joint probability function $p_{X_1, X_2}(x_1, x_2)$ of two discrete random variables $X_1$ and $X_2$ defined on the two-dimensional set of points $R^2_X$ for which $p(x_1, x_2) > 0$. There are now two one-to-one transformations: \[ y_1 = u_1( x_1, x_2)\qquad\text{and}\qquad y_2 = u_2( x_1, x_2) \] that map $R^2_X$ onto $R^2_Y$ (the two-dimensional set of points for which $p(y_1, y_2) > 0$). The two inverse functions are \[ x_1 = w_1( y_1, y_2)\qquad\text{and}\qquad x_2 = w_2( y_1, y_2). \] Then the joint probability function of the new (transformed) random variables is \[ p_{Y_1, Y_2}(y_1, y_2) = \begin{cases} p_{X_1, X_2}\big( w_1(y_1, y_2), w_2(y_1, y_2)\big) & \text{where $(y_1, y_2)\in R^2_Y$};\\ 0 & \text{elsewhere}. \end{cases} \]

Example 6.9 (Transformation (bivariate)) Let the two discrete random variables $X_1$ and $X_2$ have the joint probability function shown in Table 6.1. Consider the two one-to-one transformations \[ Y_1 = X_1 + X_2 \qquad\text{and}\qquad Y_2 = 2 X_1. \] The joint probability function of $Y_1$ and $Y_2$ can be found by noting where the $(x_1, x_2)$ pairs are mapped to in the $y_1, y_2$ space:

$(x_1, x_2)$	$\mapsto$	$(y_1,y_2)$
$(-1, 0)$	$\mapsto$	$(-1, -2)$
$(-1, 1)$	$\mapsto$	$(0, -2)$
$(-1, 2)$	$\mapsto$	$(1, -2)$
$(1, 0)$	$\mapsto$	$(1, 2)$
$(1, 1)$	$\mapsto$	$(2, 2)$
$(1, 2)$	$\mapsto$	$(3, 2)$

The joint probability function can then be constructed as shown in Table 6.2.

TABLE 6.1: A bivariate probability function
	$x_2 = 0$	$x_2 = 1$	$x_2 = 2$
$x_1 = -1$	$0.3$	$0.1$	$0.1$
$x_1 = +1$	$0.2$	$0.2$	$0.1$

TABLE 6.2: The joint probability function for $Y_1$ and $Y_2$
	$y_1 = -1$	$y_1 = 0$	$y_1 = 1$	$y_1 = 2$	$y_1 = 3$
$y_2 = -2$	$0.3$	$0.1$	$0.1$	$0.0$	$0.0$
$y_2 = +2$	$0.0$	$0.0$	$0.2$	$0.2$	$0.1$

Sometimes, a joint probability function of two random variables is given, but only one new random variable is required. In this case, a second (dummy) transformation is used, usually a very simple transformation.

Example 6.10 (Transformation (bivariate)) Let $X_1$ and $X_2$ be two independent random variables with the joint probability function \[ p_{X_1, X_2}(x_1, x_2) = \frac{\mu_1^{x_1} \mu_x^{x_2} \exp( -\mu_1 - \mu_2 )}{x_1!\, x_2!} \quad\text{for $x_1$ and $x_2 = 0, 1, 2, \dots$} \] This is the joint probability function of two independent Poisson random variables. Suppose we wish to find the probability function of $Y_1 = X_1 + X_2$.

Consider the two one-to-one transformations, where $Y_2 = X_2$ is just a dummy transformation: \[\begin{align} y_1 &= x_1 + x_2 = u_1(x_1, x_2)\\ y_2 &= x_2\phantom{{} + x_2} = u_2(x_1, x_2) \end{align}\] which map the points in $R^2_X$ onto \[ R^2_Y = \left\{ (y_1, y_2)\mid y_1 = 0, 1, 2, \dots; y_2 = 0, 1, 2, \dots, y_1\right\}. \] $Y_2$ is a dummy transform, and so is chosen to be very simple. Any second transform could be chosen (as it is not of direct interest), and so choose one that is simple. The inverse functions are \[\begin{align*} x_1 &= y_1 - y_2 = w_1(y_1, y_2)\\ x_2 &= y_2 \phantom{{} - y_2} = w_2(y_2) \end{align*}\] by rearranging the original transformations. Then the joint probability function of $Y_1$ and $Y_2$ is \[\begin{align*} p_{Y_1, Y_2}(y_1, y_2) &= p_{X_1, X_2}\big( w_1(y_1, y_2), w_2(y_1, y_2)\big) \\ &= \frac{\mu_1^{y_1 - y_2}\mu_2^{y_2} \exp(-\mu_1 - \mu_2)}{(y_1 - y_2)! y_2!}\quad \text{for $(y_1, y_2)\in R^2_Y$}. \end{align*}\] Recall that we seek the probability function of just $Y_1$, so we need to find the marginal probability function of $p_{Y_1, Y_2}(y_1, y_2)$. The marginal probability function of $Y_1$ is \[ p_{Y_1}(y_1) = \sum_{y_2 = 0}^{y_1} p_{Y_1, Y_2}(y_1, y_2) = \sum_{y_2 = 0}^{y_1} \frac{\mu_1^{y_1 - y_2}\mu_2^{y_2} \exp(-\mu_1 - \mu_2)}{(y_1 - y_2)!\, y_2!}, \] which is equivalent to \[ p_{Y_1}(y_1) = \begin{cases} \displaystyle{\frac{(\mu_1 + \mu_2)^{y_1}\exp\big[-(\mu_1 + \mu_2)\big]}{y_1!}} & \text{for $y_1 = 0, 1, 2, \dots$}\\ 0 & \text{otherwise}. \end{cases} \] You should recognise this as the probability function of a Poisson random variable (Def. 7.12) with mean $\mu_1 + \mu_2$. Thus $Y_1 \sim \text{Pois}(\lambda = \mu_1 + \mu_2)$.

6.3 The distribution function method

This method only works for continuous random variables.

The distribution function method involves two steps:

Find the distribution function of the transformed variable.
Differentiate this distribution function to find the probability density function.

The procedure is best demonstrated using an example.

Example 6.11 (Distribution function method) Consider the random variable $X$ with PDF \[ f_X(x) = \begin{cases} x/4 & \text{for $1 < x < 3$};\\ 0 & \text{elsewhere}. \end{cases} \] To find the PDF of the random variable $Y$ where $Y = X^2$, first see that $1 < y < 9$ and the transformation is monotonic over this region. The distribution function for $Y$ is \[\begin{align*} F_Y(y) &= \Pr(Y\le y) \qquad\text{(by definition)}\\ &= \Pr(X^2 \le y) \qquad\text{(since $Y = X^2$)}\\ &= \Pr(X\le \sqrt{y}\,). \end{align*}\] This last step is not trivial, but is critical. Sometimes, more care is needed (see Example 6.12). In this case, there is a one-to-one relationship between $X$ and $Y$ over the region of which $X$ is defined (i.e., has a positive probability); see Fig. 6.6.

Then continue as follows: \[\begin{align*} F_Y(y) =\Pr( X\le \sqrt{y}\,) &= F_X\big(\sqrt{y}\,\big) \qquad\text{(definition of $F_X(x)$)} \\ &= \int_1^{\sqrt{y}} (x/4) \,dx = (y - 1)/8 \end{align*}\] for $1 < y < 9$, and is zero elsewhere. This is the distribution function of $Y$; to find the PDF: \[ f_Y(y) = \frac{d}{dy} (y - 1)/8 = \begin{cases} 1/8 & \text{for $1 < y < 9$};\\ 0 & \text{elsewhere}. \end{cases} \] Note the range for which $Y$ is defined; since $1 < x < 3$, then $1 < y < 9$.

$The transformation $Y = X^2$ when $X$ is defined from $1$ to $3$. The thicker line corresponds to the region where the transformation applies. Note that if $Y < y$, then $2 - \sqrt{y - 1} < X < 2 + \sqrt{y - 1}$.$

FIGURE 6.6: The transformation $Y = X^2$ when $X$ is defined from $1$ to $3$. The thicker line corresponds to the region where the transformation applies. Note that if $Y < y$, then $2 - \sqrt{y - 1} < X < 2 + \sqrt{y - 1}$.

Example 6.12 (Transformation) Consider the same random variable $X$ as in the previous example, but the transformation $Y = (X - 2)^2 + 1$ (Fig. 6.7).

In this case, the transformation is not a one-to-one transform. Proceed as before to find the distribution function of $Y$: \[\begin{align*} F_Y(y) &= \Pr(Y\le y) \qquad\text{(by definition)}\\ &= \Pr\big( (X - 2)^2 + 1 \le y\big) \end{align*}\] since $Y = (X - 2)^2 + 1$. From Fig. 6.7, whenever $(X - 2)^2 + 1 < y$ for some value $y$, then $X$ must be in the range $2 - \sqrt{y - 1}$ to $2 + \sqrt{y - 1}$. So: \[\begin{align*} F_Y(y) &= \Pr\big( (X - 2)^2 + 1 \le y\big) \\ &= \Pr\left( 2 - \sqrt{y - 1} < X < 2 + \sqrt{y - 1} \right)\\ &= \int_{2-\sqrt{y - 1}}^{2 + \sqrt{y - 1}} x/4\,dx \\ &= \left.\frac{1}{8} x^2\right|_{2 - \sqrt{y - 1}}^{2 + \sqrt{y - 1}} \\ &= \frac{1}{8} \left[ \left(2 + \sqrt{y - 1}\right)^2 - \left(2 - \sqrt{y - 1}\right)^2\right] \\ &= \sqrt{y - 1}. \end{align*}\] Again, this is the distribution function; so differentiating: \[ f_Y(y) = \begin{cases} \frac{1}{2\sqrt{y - 1}} & \text{for $1 < y < 2$};\\ 0 & \text{elsewhere}. \end{cases} \]

$The transformation $Y = (X - 2)^2 + 1$ when $X$ is defined from $1$ to $3$. The thicker line corresponds to the region where the transformation applies. Note that if $Y < y$, then $2 - \sqrt{y - 1} < X < 2 + \sqrt{y - 1}$.$

FIGURE 6.7: The transformation $Y = (X - 2)^2 + 1$ when $X$ is defined from $1$ to $3$. The thicker line corresponds to the region where the transformation applies. Note that if $Y < y$, then $2 - \sqrt{y - 1} < X < 2 + \sqrt{y - 1}$.

Example 6.13 (Transformation) Example 6.8 is repeated here using the distribution function method. Given $Z$ is distributed $N(0, 1)$ we seek the probability distribution of $Y = \frac{1}{2} Z^2$. First, \[ f_Z(z) = (2\pi )^{-\frac 12}\,e^{-z^2/2}\quad\text{for $z\in (-\infty ,\,\infty )$}. \] Let $Y$ have PDF $f_Y(y)$ and df $F_Y(y)$. Then \[\begin{align*} F_Y(y) = \Pr(Y\leq y) &= \Pr\left(\frac{1}{2}Z^2\leq y\right)\\ &= \Pr(Z^2\leq 2y)\\ & = \Pr(-\sqrt{2y}\leq Z\leq \sqrt{2y}\,)\\ & = F_Z(\sqrt{2y}\,) - F_Z(-\sqrt{2y}\,) \end{align*}\] where $F_Z$ is the distribution function of $Z$. Hence \[\begin{align*} f_Y(y) = F_Y'(y) &= F_Z'(\sqrt{2y}\,)-F_Z'(-\sqrt{2y}\,)\\ &= \frac{\sqrt{2}}{2\sqrt{y}}f_Z(\sqrt{2y}\,) - \frac{\sqrt{2}}{- 2\sqrt{y}}f_Z(-\sqrt{2y}\,)\\[2mm] &= \frac{1}{\sqrt{2y}}[f_Z(\sqrt{2y}\,) + f_Z(-\sqrt{2y}\,)]\\ &= \frac{1}{\sqrt{2y}} \left[ \frac{1}{\sqrt{2\pi}}\,e^{-y}+\frac{1}{\sqrt{2\pi}}\,e^{-y}\right]\\ &= \frac{e^{-y}y^{-\frac{1}{2}}}{\sqrt{\pi}} \end{align*}\] as before.

Care is needed to ensure the steps are followed logically. Diagrams like Fig. 6.6 and 6.7 are encouraged.

The functions that are produced should be PDFs; check that this is the case.

This method can also be used when there is more than one variable of interest, but we do not cover this.

6.4 The moment-generating function method

The moment-generating function (MGF) method is useful for finding the distribution of a linear combination of $n$ independent random variables. The method essentially involves the computation of the MGF of the transformed variable $Y = u(X_1, X_2, \dots, X_n)$ when the joint distribution of independent $X_1, X_2, \dots, X_n$ is given.

The MGF method relies on this observation: since the MGF of a random variable (if it exists) completely specifies the distribution of the random variable, then if two random variables have the same MGF they must have identical distributions. Below, the transformation $Y = X_1 + X_2 + \cdots X_n$ is demonstrated, but the same principles can be applied for other linear combinations also.

Consider $n$ independent random variables $X_1, X_2, \dots, X_n$ with MGFs $M_{X_1}(t)$, $M_{X_2}(t)$, $\dots$, $M_{X_n}(t)$, and consider the transformation $Y = X_1 + X_2 + \cdots X_n$. Since the $X_i$ are independent, $f_{X_1,X_2\dots X_n}(x_1, x_2, \dots, x_n) = f_{X_1}(x_1).f_{X_2}(x_2)\dots f_{X_n}(x_n)$. So, by definition of the MGF, \[\begin{align*} M_Y(t) &= \operatorname{E}(\exp(tY)) \\ &= \operatorname{E}(\exp[t(X_1 + X_2 + \cdots X_n)]) \\ &= \int\!\!\!\int\!\!\!\cdots\!\!\!\int \exp[t(x_1 + x_2 + \cdots x_n)] f(x_1, x_2, \dots x_n)\,dx_n\dots dx_2\, dx_1 \\ &= \int\!\!\!\int\!\!\!\cdots\!\!\!\int \exp(tx_1) f(x_1) \exp(t{x_2}) f(x_2)\dots \exp(t{x_n})f(x_n) \,dx_n\dots dx_2\, dx_1 \\ &= \int \exp(t x_1) f(x_1)\,dx_1 \int \exp(t{x_2}) f(x_2)\,dx_2 \dots \int \exp(t{x_n})f(x_n)\,dx_n \\ &= M_{X_1}(t) M_{X_2}(t)\dots M_{X_n}(t) \\ &= \prod_{i = 1}^n M_{X_i}(t). \end{align*}\] ($\prod$ is the symbol for a product of terms, in the same way that $\sum$ is the symbol for a summation of terms.) The above result also holds for discrete variables, where summations replace integrations.

This result follows: if $X_1, X_2, \dots, X_n$ are independent random variables and $Y = X_1 + X_2 + \dots + X_n$, then the MGF of $Y$ is \[ M_Y(t) = \prod_{i = 1}^n M_{X_i}(t) \] where $M_{X_i}(t)$ is the MGF of $X_i$ at $t$ for $i = 1, 2, \dots, n$.

Example 6.14 (MGF method for transformations) Suppose that $X_i \sim \text{Pois}(\lambda_i)$ for $i = 1, 2, \dots, n$, and we wish to find the distribution of $Y = X_1 + X_2 + \dots + X_n$.

Since $X_i$ has a Poisson distribution with parameter $\lambda_i$ for $i, 2, \dots n$, the MGF of $X_i$ is \[ M_{X_i}(t) = \exp[ \lambda_i(e^t - 1)]. \] The MGF of $Y = X_1 + X_2 + \cdots X_n$ is \[\begin{align*} M_Y(t) &= \prod_{i = 1}^n \exp[ \lambda_i(e^t - 1)] \\ &= \exp[ \lambda_1(e^t - 1)] \exp[ \lambda_2(e^t - 1)] \dots \exp[ \lambda_n(e^t - 1)] \\ &= \exp\left[ (e^t - 1)\sum_{i = 1}^n \lambda_i\right]. \end{align*}\] Using $\Lambda = \sum_{i = 1}^n \lambda_i$, the MGF of $Y$ is \[ M_Y(t) = \exp\left[ (e^t - 1)\Lambda \right], \] which is the MGF of a Poisson distribution with mean $\Lambda = \sum_{i = 1}^n \lambda_i$. This means that the sum of $n$ independent Poisson distribution is also a Poisson distribution, whose mean is the sum of the individual Poisson means.

6.5 Exercises

Selected answers appear in Sect. E.6.

Exercise 6.1 Suppose the PDF of $X$ is given by \[ f_X(x) = \begin{cases} x/2 & \text{$0 < x < 2$};\\ 0 & \text{otherwise}. \end{cases} \]

Find the PDF of $Y = X^3$ using the change of variable method.
Find the PDF of $Y = X^3$ using the distribution function method.

Exercise 6.2 The discrete bivariate random vector $(X_1, X_2)$ has the joint probability function \[ f_{X_1, X_2}(x_1, x_2) = \begin{cases} (2x_1+ x _2)/6 & \text{for $x_1 = 0, 1$ and $x_2 = 0, 1$};\\ 0 & \text{elsewhere}. \end{cases} \] Consider the transformations \[\begin{align*} Y_1 &= X_1 + X_2 \\ Y_2 &= \phantom{X_1+{}} X_2 \end{align*}\]

Determine the joint probability function of $(Y_1, Y_2)$.
Deduce the distribution of $Y_1$.

Exercise 6.3 Consider $n$ random variables $X_i$ such that $X_i \sim \text{Gam}(\alpha_i, \beta)$. Determine the distribution of $Y = \sum_{i = 1}^n X_i$ using MGFs.

Exercise 6.4 The random variable $X$ has PDF \[ f_X(x) = \frac{1}{\pi(1 + x^2)} \] for $-\infty < x < \infty$. Find the PDF of $Y$ where $Y = X^2$.

Exercise 6.5 A random variable $X$ has distribution function \[ F_X(x) = \begin{cases} 0 & \text{for $x \le -0.5$};\\ \frac{2x + 1}{2} & \text{for $-0.5 < x < 0.5$};\\ 1 & \text{for $x \ge 0.5$}. \end{cases} \]

Find, and plot, the PDF of $X$.
Find the distribution function, $F_Y(y)$, of the random variable $Y = 4 - X^2$.
Hence find, and plot, the PDF of $Y$, $f_Y(y)$.

Exercise 6.6 Suppose a projectile is fired at an angle $\theta$ from the horizontal with velocity $v$. The horizontal distance that the projectile travels $D$ is \[ D = \frac{v^2}{g} \sin 2\theta, \] where $g$ is the acceleration due to gravity ($g\approx 9.8$ m.s^$-2$).

If $\theta$ is uniformly distributed over the range $(0, \pi/4)$, find the probability density function of $D$.
Sketch the PDF of $D$ over a suitable range for $v = 12$ and using $g\approx 9.8$m.s^$-2$.

Exercise 6.7 Most computers have facilities to generate continuous uniform (pseudo-)random numbers between zero and one, say $X$. When needed, exponential random numbers are obtained from $X$ using the transformation $Y = -\alpha\ln X$.

Show that $Y$ has an exponential distribution and determine its parameters.
Deduce the mean and variance of $Y$.

Exercise 6.8 Consider a random variable $W$ for which $\Pr(W = 2) = 1/6$, $\Pr(W = -2) = 1/3$ and $\Pr(W = 0) = 1/2$.

Plot the probability function of $W$.
Find the mean and variance of $W$.
Determine the distribution of $V = W^2$.
Find the distribution function of $W$.

Exercise 6.9 In a study to model the load on bridges (Lu, Ma, and Liu 2019), the researchers modelled the Gross Vehicle Weight (GVM, in kilonewtons) weight of smaller trucks $S$ using $S\sim N(390, 740$, and the weight of bigger trucks $B$ using $L\sim N(865, 142)$. The total load distribution $L$ was then modelled as $L = 0.24S + 0.76B$, reflecting the expected proportion if smaller and bigger trucks using the bridge.

Plot the distribution of $L$.
Compute the mean and standard deviation of $L$.

Exercise 6.10 Suppose the random variable $X$ has a normal distribution with mean $\mu$ and variance $\sigma^2$. The random variable $Y = \exp X$ is said to have a log-normal distribution.

Determine the distribution function of $Y$ in terms of the function $\Phi(\cdot)$ (see Def. 8.7).
Differentiate to find the PDF of $Y$.
Plot the log-normal distribution for various parameter values.
Determine $\Pr(Y > 2 | Y < 1)$ when $\mu = 2$ and $\sigma^2 = 2$.

(Hint: Use the dlnorm() and plnorm() functions in R, where $\mu = {}$meanlog and $\sigma = {}$sdlog.)

Exercise 6.11 If $X$ is a random variable with probability function \[ \Pr(X = x) = \binom{4}{x} (0.2)^x (0.8)^{4 - x} \quad \text{for $x = 0, 1, 2, 3, 4$}, \] find the probability function of the random variable defined by $Y = \sqrt{X}$.

Exercise 6.12 Given the random variable $X$ with probability function \[ \Pr(X = x) = \frac{x^2}{30} \quad \text{for $x = 1, 2, 3, 4$}, \] find the probability function of $Y= (X - 3)^2$.

Exercise 6.13 A random variable $X$ has distribution function \[ F_X(x) = \begin{cases} 0 & \text{for $x < -0.5$};\\ \frac{2x + 1}{2}, & \text{for $-0.5 < x < 0.5$}; \\ 1 & \text{for $x > 0.5$}. \end{cases} \]

Find the distribution function, $F_Y(y)$, of the random variable $Y = 4 - X^2$.
Hence find the PDF of $Y$.

Exercise 6.14 If the random variable $X$ has an exponential distributed with mean $1$, show that the distribution of $-\log(X)$ has a Gumbel distribution (Eq. (5.7)) with $\mu = 0$ and $\sigma = 1$.

Exercise 6.15 Let $X$ have a gamma distribution with parameters $\alpha > 2$ and $\beta > 0$.

Prove that the mean of $1/X$ is $\beta/(\alpha - 1)$.
Prove that the variance of $1/X$ is $\beta^2/[(\alpha - 1)^2(\alpha - 2)]$.

Exercise 6.16 In a study modelling waiting times at a hospital (Khadem et al. 2008), patients are classified into one of three categories:

Red: Critically ill or injured patients.
Yellow: Moderately ill or injured patients.
Green: Minimally injured or uninjured patients.

For ‘Green’ patients, the service time $S$ was modelled as $S = 4.5 + 11V$, where $V \sim \text{Beta}(0.287, 0.926)$.

Produce well-labelled plots of the PDF and df of $S$, showing important features.
What proportion of patients have a service time exceeding $15\,\text{mins}$?
The quickest $20$% of patients are serviced within what time?

Exercise 6.17 In a study modelling waiting times at a hospital (Khadem et al. 2008), patients are classified into one of three categories:

Red: Critically ill or injured patients.
Yellow: Moderately ill or injured patients.
Green: Minimally injured or uninjured patients.

The time (in minutes) spent in the reception are for ‘Yellow’ patients, say $T$, is modelled as $T = 0.5 + W$, where $W\sim \text{Exp}(16.5)$.

Plot the PDF and df of $T$.
What proportion of patients waits more than $20 mins$, if they have already been waiting for $10\,\text{mins}$?
How long to the slowest $10$% of patients need to wait?

Exercise 6.18 Suppose the random variable $Z$ has the PDF \[ f_Z(z) = \begin{cases} \frac{1}{3} & \text{for $-1 < z < 2$};\\ 0 & \text{elsewhere}. \end{cases} \]

Find the probability density function of $Y$, where $Y = Z^2$, using the distribution function method.
Confirm that your final PDF of $Y$ is a valid PDF.
Produce a well-labelled plot of the PDF of $Y$. Ensure all important features and points are clearly labelled.

Exercise 6.19 Show that the chi-squared distribution is a special case of the gamma distribution, with $\alpha = \nu/2$ and $\beta = 2$.

Exercise 6.20 Suppose the random variable $X$ is defined as shown in Fig. 6.8.

Determine the distribution function for $X$.
Find the probability density function for the random variable $Y$, where $Y = 6 - 2X$.
Confirm that your probability density function for $Y$ is a valid pdf.
Plot the probability density function of $Y$.

$The probability density function for the random variable\ $X$.$

FIGURE 6.8: The probability density function for the random variable $X$.

Exercise 6.21 Suppose the random variable $X$ is defined as shown in Fig. 6.8.

Determine the distribution function for $X$ (this was done in Exercise 6.20).
Find the probability density function for the random variable $Z$, where $Z = (X - 2)^2$.
Confirm that your probability density function for $Z$ is a valid pdf.
Plot the probability density function of $Z$.

Exercise 6.22 The time taken to run a distance $D$ (in metres) by a professional athlete, say $T$ (in seconds), varies with the distribution shown in Fig. 6.9 (left panel). The average velocity of the runner, say $V$, is related to the time by $V = D/T$.

Determine the probability density function for the runner’s velocity.
Suppose $D = 100$, $\mu = 12$ and $\Delta = 0.25$. Plot the probability density function for $V$.

$The probability density function for the random variable\ $T$, the time for the run.$

FIGURE 6.9: The probability density function for the random variable $T$, the time for the run.

Exercise 6.23 Suppose the instantaneous voltage $V$ (in volts) in a circuit varies over time such that \[ f_V(v) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{ -\frac{x^2}{2\sigma^2}\right\}. \] as shown in Fig. 6.9 (right panel). (Later, we will identify this as the normal distribution.)

Determine the probability density function of the instantaneous power in the circuit $P$, where $P = V^2/R$ for some circuit resistance $R$ (in ohms).
Suppose $\sigma = 1$, and $R = 10$. Plot the probability density function for $P$.

5 Mathematical expectation

7 Standard discrete distributions

\((x_1, x_2)\)	\(\mapsto\)	\((y_1,y_2)\)
\((-1, 0)\)	\(\mapsto\)	\((-1, -2)\)
\((-1, 1)\)	\(\mapsto\)	\((0, -2)\)
\((-1, 2)\)	\(\mapsto\)	\((1, -2)\)
\((1, 0)\)	\(\mapsto\)	\((1, 2)\)
\((1, 1)\)	\(\mapsto\)	\((2, 2)\)
\((1, 2)\)	\(\mapsto\)	\((3, 2)\)

	\(x_2 = 0\)	\(x_2 = 1\)	\(x_2 = 2\)
\(x_1 = -1\)	\(0.3\)	\(0.1\)	\(0.1\)
\(x_1 = +1\)	\(0.2\)	\(0.2\)	\(0.1\)

	\(y_1 = -1\)	\(y_1 = 0\)	\(y_1 = 1\)	\(y_1 = 2\)	\(y_1 = 3\)
\(y_2 = -2\)	\(0.3\)	\(0.1\)	\(0.1\)	\(0.0\)	\(0.0\)
\(y_2 = +2\)	\(0.0\)	\(0.0\)	\(0.2\)	\(0.2\)	\(0.1\)