Functions of a Random Variable

Functions of Random Variables

If XX is a random variable and Y=g(X)Y = g(X), then YY itself is a random variable. Consequently, we can discuss its PMF, CDF, and expected value. The range of YY can be written as:

RY={g(x)xRX}where Rx is the range of X R_Y = \{g(x) | x \in R_X\} \quad \text{where $R_x$ is the range of $X$}

To find the PMF of Y=g(X)Y = g(X) given the PMF of XX, we can write:

PY(y)=P(Y=y)=P(g(X)=y)=x:g(x)=yPX(x) P_Y(y) = P(Y = y) = P(g(X) = y) = \sum_{x:g(x) = y} P_X(x)

Let's look at an example.

Example

Let XX be a discrete random variable with PX(k)=15P_X(k) = \frac{1}{5} for k=1,0,1,2,3k = -1, 0, 1, 2, 3. Let Y=2XY = 2|X|. Determine the range and PMF of YY.

Solution

First, note that the range of YY is:

RY={2x where xRX}={0,2,4,6} R_Y = \{2|x| \text{ where } x \in R_X\} = \{0, 2, 4, 6\}

To find PY(y)P_Y(y), we need to determine P(Y=y)P(Y = y) for y=0,2,4,6y = 0, 2, 4, 6. We have:

PY(0)=P(Y=0)=P(2X=0)=P(X=0)=15 P_Y(0) = P(Y = 0) = P(2|X| = 0) = P(X = 0) = \frac{1}{5}

PY(2)=P(Y=2)=P(2X=2)=P(X=1 or X=1)=PX(1)+PX(1)=15+15=25 P_Y(2) = P(Y = 2) = P(2|X| = 2) = P(X = -1 \text{ or } X = 1) = P_X(-1) + P_X(1) = \frac{1}{5} + \frac{1}{5} = \frac{2}{5}

PY(4)=P(Y=4)=P(2X=4)=P(X=2)=15 P_Y(4) = P(Y = 4) = P(2|X| = 4) = P(X = 2) = \frac{1}{5}

PY(6)=P(Y=6)=P(2X=6)=P(X=3)=15 P_Y(6) = P(Y = 6) = P(2|X| = 6) = P(X = 3) = \frac{1}{5}

So, in summary,

PY(k)={15for k=0,4,625for k=20otherwise P_Y(k) = \begin{cases} \frac{1}{5} & \text{for } k = 0, 4, 6 \\ \frac{2}{5} & \text{for } k = 2 \\ 0 & \text{otherwise} \end{cases}

Expected Value of a Function of a Random Variable (LOTUS)

Let XX be a discrete random variable with PMF PX(x)P_X(x), and let Y=g(X)Y = g(X). Suppose we want to find E[Y]E[Y]. One approach is to first find the PMF of YY and then use the expectation formula E[Y]=E[g(X)]=yRYyPY(y)E[Y] = E[g(X)] = \sum_{y \in R_Y} y P_Y(y). However, a more convenient method is the law of the unconscious statistician (LOTUS).

Law of the Unconscious Statistician (LOTUS) for Discrete Random Variables:

E[g(X)]=xkRXg(xk)PX(xk) E[g(X)] = \sum_{x_k \in R_X} g(x_k)P_X(x_k)

This can be proved by expressing E[Y]=E[g(X)]=yRYyPY(y)E[Y] = E[g(X)] = \sum_{y \in R_Y} y P_Y(y) in terms of PX(x)P_X(x). Typically, using LOTUS is easier than the direct definition when we need E[g(X)]E[g(X)].

Let's prove E[aX+b]=aE[X]+bE[aX + b] = aE[X] + b (linearity of expectation), where g(X)=aX+bg(X) = aX + b:

E[aX+b]=xkRX(axk+b)PX(xk)=xkRXaxkPX(xk)+xkRXbPX(xk)=axkRXxkPX(xk)+bxkRXPX(xk)=aE[X]+b E[aX + b] = \sum_{x_k \in R_X} (ax_k + b)P_X(x_k) = \sum_{x_k \in R_X} ax_k P_X(x_k) + \sum_{x_k \in R_X} b P_X(x_k) = a \sum_{x_k \in R_X} x_k P_X(x_k) + b \sum_{x_k \in R_X} P_X(x_k) = a E[X] + b

Transformations of Random Variables

For a random variable YY, whether discrete or continuous, and a function g:RRg: \mathbb{R} \to \mathbb{R}, W=g(Y)W = g(Y) is also a random variable. Its distribution (pdf), mean, variance, etc., will differ from YY's. Transformations of random variables are crucial in statistics.

Theorem 4.1.1

Suppose YY is a random variable, gg is a transformation, and W=g(Y)W = g(Y). Then:

  1. If YY is discrete, with pmf pYp_Y, we have:

E[W]=ySYg(y)pY(y) E[W] = \sum_{y \in S_Y} g(y) p_Y(y)

  1. If YY is continuous, with pdf fYf_Y, we have:

E[W]=g(y)fY(y)dy E[W] = \int_{-\infty}^{\infty} g(y) f_Y(y) \, dy

The cdf-method

The fundamental formula of Theorem 4.1.1 helps compute expectations, but it doesn't provide the distribution of W=g(Y)W = g(Y). To find the cdf FWF_W of WW, given the cdf FYF_Y of YY, we can write:

FW(w)=P[Ww]=P[g(Y)w] F_W(w) = P[W \leq w] = P[g(Y) \leq w]

The probability on the right needs to be expressed in terms of YY. If gg is strictly increasing, it admits an inverse function g1g^{-1} and we can write:

FW(w)=P[g(Y)w]=P[Yg1(w)]=FY(g1(w)) F_W(w) = P[g(Y) \leq w] = P[Y \leq g^{-1}(w)] = F_Y(g^{-1}(w))

For strictly decreasing gg:

P[g(Y)w]=P[Yg1(w)] P[g(Y) \leq w] = P[Y \geq g^{-1}(w)]

In continuous cases, P[Yy]=1FY(y)P[Y \geq y] = 1 - F_Y(y), so:

FW(w)=P[g(Y)w]=P[Yg1(w)]=1FY(g1(w)) F_W(w) = P[g(Y) \leq w] = P[Y \geq g^{-1}(w)] = 1 - F_Y(g^{-1}(w))

Functions of Two Random Variables

For two discrete random variables XX and YY, and Z=g(X,Y)Z = g(X, Y), we can determine the PMF of ZZ as:

PZ(z)=P(g(X,Y)=z)=(xi,yj)AzPXY(xi,yj),where Az={(xi,yj)RXY:g(xi,yj)=z} P_{Z}(z) = P(g(X, Y) = z) = \sum_{(x_i, y_j) \in A_z} P_{XY}(x_i, y_j), \quad \text{where } A_z = \{(x_i, y_j) \in R_{XY} : g(x_i, y_j) = z\}

For E[g(X,Y)]E[g(X, Y)], we can use LOTUS:

LOTUS for two discrete random variables:

E[g(X,Y)]=(xi,yj)RXYg(xi,yj)PXY(xi,yj) E[g(X, Y)] = \sum_{(x_i, y_j) \in R_{XY}} g(x_i, y_j) P_{XY}(x_i, y_j)

Linearity of Expectation: For two discrete random variables XX and YY, E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y].

Let g(X,Y)=X+Yg(X, Y) = X + Y. Using LOTUS, we have:

E[X+Y]=(xi,yj)RXY(xi+yj)PXY(xi,yj) E[X + Y] = \sum_{(x_i, y_j) \in R_{XY}} (x_i + y_j) P_{XY}(x_i, y_j)

=(xi,yj)RXYxiPXY(xi,yj)+(xi,yj)RXYyjPXY(xi,yj) = \sum_{(x_i, y_j) \in R_{XY}} x_i P_{XY}(x_i, y_j) + \sum_{(x_i, y_j) \in R_{XY}} y_j P_{XY}(x_i, y_j)

=xiRXyjRYxiPXY(xi,yj)+xiRXyjRYyjPXY(xi,yj) = \sum_{x_i \in R_X} \sum_{y_j \in R_Y} x_i P_{XY}(x_i, y_j) + \sum_{x_i \in R_X} \sum_{y_j \in R_Y} y_j P_{XY}(x_i, y_j)

=xiRXxiyjRYPXY(xi,yj)+yjRYyjxiRXPXY(xi,yj)=xiRXxiPX(xi)+yjRYyjPY(yj)(marginal PMF) = \sum_{x_i \in R_X} x_i \sum_{y_j \in R_Y} P_{XY}(x_i, y_j) + \sum_{y_j \in R_Y} y_j \sum_{x_i \in R_X} P_{XY}(x_i, y_j) = \sum_{x_i \in R_X} x_i P_X(x_i) + \sum_{y_j \in R_Y} y_j P_Y(y_j) \quad \text{(marginal PMF)}

=E[X]+E[Y] = E[X] + E[Y]

Functions of Two Continuous Random Variables

For two continuous random variables g(X,Y)g(X, Y), the concepts are similar. For E[g(X,Y)]E[g(X, Y)], we use LOTUS:

LOTUS for two continuous random variables:

E[g(X,Y)]=g(x,y)fXY(x,y)dxdy E[g(X, Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x, y) f_{XY}(x, y) \, dx \, dy

If Z=g(X,Y)Z = g(X, Y) and we are interested in its distribution, we can start by writing:

FZ(z)=P(Zz)=P(g(X,Y)z)=DfXY(x,y)dxdy F_Z(z) = P(Z \leq z) = P(g(X, Y) \leq z) = \iint\limits_D f_{XY}(x, y) \, dx \, dy

where D={(x,y)g(x,y)z}D = \{(x, y) | g(x, y) \leq z\}. To find the PDF of ZZ, we differentiate FZ(z)F_Z(z).

The Method of Transformations

When we have functions of two or more jointly continuous random variables, we may use a method similar to the previous theorems to find the resulting PDFs. Here's the theorem:

Theorem

Let XX and YY be two jointly continuous random variables. Let (Z,W)=g(X,Y)=(g1(X,Y),g2(X,Y))(Z, W) = g(X, Y) = (g_1(X, Y), g_2(X, Y)), where g:R2R2g: \mathbb{R}^2 \mapsto \mathbb{R}^2 is a continuous one-to-one (invertible) function with continuous partial derivatives. Let h=g1h = g^{-1}, i.e., (X,Y)=h(Z,W)=(h1(Z,W),h2(Z,W))(X, Y) = h(Z, W) = (h_1(Z, W), h_2(Z, W)). Then ZZ and WW are jointly continuous and their joint PDF, fZW(z,w)f_{ZW}(z, w), for (z,w)RZW(z, w) \in R_{ZW} is given by:

fZW(z,w)=fXY(h1(z,w),h2(z,w))J f_{ZW}(z, w) = f_{XY}(h_1(z, w), h_2(z, w)) |J|

where JJ is the Jacobian of hh defined by:

J=det[h1zh1wh2zh2w]=h1zh2wh2zh1w J = \det \begin{bmatrix} \frac{\partial h_1}{\partial z} & \frac{\partial h_1}{\partial w} \\ \frac{\partial h_2}{\partial z} & \frac{\partial h_2}{\partial w} \end{bmatrix} = \frac{\partial h_1}{\partial z} \cdot \frac{\partial h_2}{\partial w} - \frac{\partial h_2}{\partial z} \cdot \frac{\partial h_1}{\partial w}

Note: If XX and YY are two jointly continuous random variables and Z=X+YZ = X + Y, then:

fZ(z)=fXY(w,zw)dw=fXY(zw,w)dw f_Z(z) = \int_{-\infty}^{\infty} f_{XY}(w, z - w) \, dw = \int_{-\infty}^{\infty} f_{XY}(z - w, w) \, dw

If XX and YY are also independent, then:

fZ(z)=fX(z)fY(z)=fX(w)fY(zw)dw=fY(w)fX(zw)dw f_Z(z) = f_X(z) \ast f_Y(z) = \int_{-\infty}^{\infty} f_X(w) f_Y(z - w) \, dw = \int_{-\infty}^{\infty} f_Y(w) f_X(z - w) \, dw

Moment Generating Functions

Moment generating functions (MGFs) are useful for several reasons, particularly for analyzing sums of random variables. Before discussing MGFs, let's define moments.

Definition: The n-th moment of a random variable XX is defined as E[Xn]E[X^n]. The n-th central moment of XX is defined as E[(XEX)n]E[(X - EX)^n].

For example, the first moment is the expected value E[X]E[X]. The second central moment is the variance of XX. Other moments provide additional useful information about random variables.

The moment generating function (MGF) of a random variable XX is a function MX(s)M_X(s) defined as:

MX(s)=E[esX] M_X(s) = E\left[e^{sX}\right]

The MGF of XX exists if there is a positive constant aa such that MX(s)M_X(s) is finite for all s[a,a]s \in [-a, a].

MGFs are useful for two main reasons. First, the MGF of XX provides all moments of XX, hence its name. Second, the MGF (if it exists) uniquely determines the distribution. That is, if two random variables have the same MGF, they must have the same distribution. This method is particularly useful when working on sums of independent random variables.

Finding Moments from MGF

Remember the Taylor series for exe^x: for all xRx \in \mathbb{R}, we have:

ex=1+x+x22!+x33!+=k=0xkk! e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots = \sum_{k=0}^{\infty} \frac{x^k}{k!}

Now, we can write:

esX=k=0(sX)kk!=k=0Xkskk! e^{sX} = \sum_{k=0}^{\infty} \frac{(sX)^k}{k!} = \sum_{k=0}^{\infty} \frac{X^k s^k}{k!}

Thus, we have:

MX(s)=E[esX]=k=0E[Xk]skk! M_X(s) = E[e^{sX}] = \sum_{k=0}^{\infty} E[X^k] \frac{s^k}{k!}

We conclude that the kkth moment of XX is the coefficient of skk!\frac{s^k}{k!} in the Taylor series of MX(s)M_X(s). Thus, if we have the Taylor series of MX(s)M_X(s), we can obtain all moments of XX.

Theorem

Consider two random variables XX and YY. Suppose there exists a positive constant cc such that the MGFs of XX and YY are finite and identical for all values of ss in [c,c][-c, c]. Then,

FX(t)=FY(t), for all tR F_X(t) = F_Y(t), \text{ for all } t \in \mathbb{R}

Sum of Independent Random Variables

Suppose X1,X2,,XnX_1, X_2, \ldots, X_n are nn independent random variables, and the random variable YY is defined as:

Y=X1+X2++Xn Y = X_1 + X_2 + \cdots + X_n

Then,

MY(s)=E[esY]=E[es(X1+X2++Xn)]=E[esX1esX2esXn] M_Y(s) = E[e^{sY}] = E[e^{s(X_1 + X_2 + \cdots + X_n)}] = E[e^{sX_1} e^{sX_2} \cdots e^{sX_n}]

=E[esX1]E[esX2]E[esXn](since the Xi’s are independent) = E[e^{sX_1}] E[e^{sX_2}] \cdots E[e^{sX_n}] \quad \text{(since the $X_i$'s are independent)}

=MX1(s)MX2(s)MXn(s) = M_{X_1}(s) M_{X_2}(s) \cdots M_{X_n}(s)

Different Views of a Function of a Random Variable (FRV)

There are several different but essentially equivalent views of a function of a random variable (FRV). We will present two of them, highlighting their differences in emphasis.

Assume we have an underlying probability space P=(Ω,F,P)P = (\Omega, F, P) and a random variable XX defined on it. Recall that XX is a rule that assigns a number X(ζ)X(\zeta) to every ζΩ\zeta \in \Omega. XX transforms the σ\sigma-field of events FF into the Borel σ\sigma-field BB of sets of numbers on the real line. If RXR_X denotes the subset of the real line reached by XX as ζ\zeta ranges over Ω\Omega, we can regard XX as an ordinary function with domain Ω\Omega and range RXR_X. Now, consider a measurable real function g(x)g(x) of the real variable xx.

First View (Y: Ω → RY)

For every ζΩ\zeta \in \Omega, we generate a number g(X(ζ))=Y(ζ)g(X(\zeta)) = Y(\zeta). The rule YY, which generates the numbers {Y(ζ)}\{Y(\zeta)\} for random outcomes {ζΩ}\{\zeta \in \Omega\}, is an RV with domain Ω\Omega and range RYRR_Y \subset \mathbb{R}. For every Borel set of real numbers BYB_Y, the set {ζ:Y(ζ)BY}\{\zeta : Y(\zeta) \in B_Y\} is an event. Specifically, the event {ζ:Y(ζ)y}\{\zeta : Y(\zeta) \leq y\} is equal to the event {ζ:g(X(ζ))y}\{\zeta : g(X(\zeta)) \leq y\}.

In this view, the emphasis is on YY as a mapping from Ω\Omega to RYR_Y, with the intermediate role of XX being suppressed.

Second View (Input/Output Systems View)

For every value of X(ζ)X(\zeta) in the range RXR_X, we generate a new number Y=g(X)Y = g(X) whose range is RYR_Y. The rule YY, whose domain is RXR_X and range is RYR_Y, is a function of the random variable XX. Here, the focus is on viewing YY as a mapping from one set of real numbers to another. A model for this view is to regard XX as the input to a system with transformation function g()g(\cdot). For such a system, an input xx gets transformed to an output y=g(x)y = g(x), and an input function XX gets transformed to an output function Y=g(X)Y = g(X).

In general, we will write {Yy}={XCy}\{Y \leq y\} = \{X \in C_y\} in the sequel. For CyC_y so determined, it follows that:

P[Yy]=P[XCy] P[Y \leq y] = P[X \in C_y]

If CyC_y is empty, then the probability of {Yy}\{Y \leq y\} is zero.

Input–Output Model

When dealing with the input–output model, it is convenient to omit references to an abstract underlying experiment and deal directly with the RVs XX and YY. In this approach, the observations on XX are the underlying experiments, events are Borel subsets of the real line R\mathbb{R}, and the set function P[]P[\cdot] is replaced by the distribution function FX()F_X(\cdot). Then YY is a mapping (an RV) whose domain is the range RXR_X of XX, and whose range RYR_Y is a subset of R\mathbb{R}. The functional properties of XX are ignored in favor of viewing XX as a mechanism that gives rise to numerically valued random phenomena. In this view, the domain of XX is irrelevant.

Additional discussion on the various views of an FRV is available in the literature.

Solving Problems of the Type Y=g(X)Y = g(X)

Since the events {X<yba}\{X < \frac{y-b}{a}\} and {Xyba}\{X \geq \frac{y-b}{a}\} are disjoint and their union is the certain event, we obtain from Axiom 3:

P[X<yba]+P[Xyba]=1 P\left[X < \frac{y - b}{a}\right] + P\left[X \geq \frac{y - b}{a}\right] = 1

For a continuous RV:

P[X<yba]=P[Xyba]andP[Xyba]=P[X>yba] P\left[X < \frac{y - b}{a}\right] = P\left[X \leq \frac{y - b}{a}\right] \quad \text{and} \quad P\left[X \geq \frac{y - b}{a}\right] = P\left[X > \frac{y - b}{a}\right]

Thus, for a<0a < 0:

FY(y)=1FX(yba) F_Y(y) = 1 - F_X\left(\frac{y - b}{a}\right)

and

fY(y)=1afX(yba),a0 f_Y(y) = \frac{1}{|a|} f_X\left(\frac{y - b}{a}\right), \quad a \neq 0

When XX is not necessarily continuous, we modify the development for a<0a < 0 because it may no longer be true that P[X<yba]=P[Xyba]P\left[X < \frac{y - b}{a}\right] = P\left[X \leq \frac{y - b}{a}\right] due to the possibility that the event {X=yba}\{X = \frac{y - b}{a}\} has a positive probability. The modified statement becomes P[X<yba]=P[Xyba]P[X=yba]=FX(yba)PX(yba)P\left[X < \frac{y - b}{a}\right] = P\left[X \leq \frac{y - b}{a}\right] - P[X = \frac{y - b}{a}] = F_X\left(\frac{y - b}{a}\right) - P_X\left(\frac{y - b}{a}\right).

Solving Problems of the Type Z=g(X,Y)Z = g(X, Y)

In many science and engineering problems, a random variable ZZ is functionally related to two (or more) random variables XX and YY. For example:

  1. The signal ZZ at the input of an amplifier consists of a signal XX to which independent random noise YY is added. Thus, Z=X+YZ = X + Y. If XX is also an RV, what is the pdf of ZZ?

Problems of the type Z=g(X,Y)Z = g(X, Y) are similar to those of Y=g(X)Y = g(X). For Y=g(X)Y = g(X), the basic problem was to find the point set CyC_y such that the events {ζ:Y(ζ)y}\{\zeta : Y(\zeta) \leq y\} and {ζ:X(ζ)Cy}\{\zeta : X(\zeta) \in C_y\} were equal. The same applies here: find the point set CzC_z in the (x,y)(x, y) plane such that the events {ζ:Z(ζ)z}\{\zeta : Z(\zeta) \leq z\} and {ζ:X(ζ),Y(ζ)Cz}\{\zeta : X(\zeta), Y(\zeta) \in C_z\} are equal, indicated by:

{Zz}={(X,Y)Cz} \{Z \leq z\} = \{(X, Y) \in C_z\}

and

FZ(z)=(x,y)CzfXY(x,y)dxdy F_Z(z) = \int \int_{(x, y) \in C_z} f_{XY}(x, y) \, dx \, dy

The point set CzC_z is determined from the functional relation g(x,y)zg(x, y) \leq z. Problems of the type Z=g(X,Y)Z = g(X, Y) involve joint densities or distributions and double integrals (or summations) instead of single ones. Hence, computing fZ(z)f_Z(z) is generally more complex than computing fY(y)f_Y(y) in Y=g(X)Y = g(X). However, we can use two labor-saving methods:

  1. Solve many Z=g(X,Y)Z = g(X, Y)-type problems using a "turn-the-crank" formula, an extension of Equation 3.2-23, through the use of auxiliary variables (Section 3.4).
  2. Solve problems of the type Z=X+YZ = X + Y using characteristic functions (Chapter 4).

Sum of Two Independent Random Variables

The situation modeled by Z=X+YZ = X + Y (and its extension Z=i=1NXiZ = \sum_{i=1}^N X_i) occurs frequently in engineering and science. Computing fZ(z)f_Z(z) is perhaps the most important problem of the type Z=g(X,Y)Z = g(X, Y). We must find the set of points CzC_z such that the event {Zz}\{Z \leq z\} is equal to {X+Yz}\{X + Y \leq z\}, and thus to {(X,Y)Cz}\{(X, Y) \in C_z\}. The set of points CzC_z represents the shaded region to the left of the line x+yzx + y \leq z.

Using Equation 3.3-2, specialized for this case, we obtain:

FZ(z)=x+yzfXY(x,y)dxdy=(zyfXY(x,y)dx)dy=[GXY(zy,y)GXY(,y)]dy F_Z(z) = \int \int_{x + y \leq z} f_{XY}(x, y) \, dx \, dy = \int_{-\infty}^{\infty} \left(\int_{-\infty}^{z - y} f_{XY}(x, y) \, dx \right) dy = \int_{-\infty}^{\infty} [G_{XY}(z - y, y) - G_{XY}(-\infty, y)] \, dy

where GXY(x,y)G_{XY}(x, y) is the indefinite integral:

GXY(x,y)=fXY(x,y)dx G_{XY}(x, y) = \int f_{XY}(x, y) \, dx

The pdf is obtained by differentiating FZ(z)F_Z(z):

fZ(z)=dFZ(z)dz=ddz[GXY(zy,y)]dy=fXY(zy,y)dy f_Z(z) = \frac{dF_Z(z)}{dz} = \int_{-\infty}^{\infty} \frac{d}{dz} [G_{XY}(z - y, y)] \, dy = \int_{-\infty}^{\infty} f_{XY}(z - y, y) \, dy

This result is significant, and when XX and YY are independent RVs such that fXY(x,y)=fX(x)fY(y)f_{XY}(x, y) = f_X(x) f_Y(y), it simplifies to the convolution integral:

fZ(z)=fX(zy)fY(y)dy f_Z(z) = \int_{-\infty}^{\infty} f_X(z - y) f_Y(y) \, dy

This convolution can also be written as:

fZ(z)=fX(x)fY(zx)dx f_Z(z) = \int_{-\infty}^{\infty} f_X(x) f_Y(z - x) \, dx

by using the variable transformation x=zyx = z - y.