Functions of a Random Variable

If XX is a random variable and Y=g(X)Y = g(X), then YY itself is a random variable. Consequently, we can discuss its PMF, CDF, and expected value. The range of YY can be written as:

RY={g(x)xRX}where Rx is the range of X \begin{equation} R_Y = \{g(x) | x \in R_X\} \quad \text{where $R_x$ is the range of $X$} \end{equation}

To find the PMF of Y=g(X)Y = g(X) given the PMF of XX, we can write:

PY(y)=P(Y=y)=P(g(X)=y)=x:g(x)=yPX(x) \begin{equation} P_Y(y) = P(Y = y) = P(g(X) = y) = \sum_{x:g(x) = y} P_X(x) \end{equation}

1 Expected Value of a Function of a Random Variable (LOTUS)

Let XX be a discrete random variable with PMF PX(x)P_X(x), and let Y=g(X)Y = g(X). Suppose we want to find E[Y]E[Y]. One approach is to first find the PMF of YY and then use the expectation formula E[Y]=E[g(X)]=yRYyPY(y)E[Y] = E[g(X)] = \sum_{y \in R_Y} y P_Y(y). However, a more convenient method is the law of the unconscious statistician (LOTUS).

Law of the Unconscious Statistician (LOTUS) for Discrete Random Variables:

E[g(X)]=xkRXg(xk)PX(xk) \begin{equation} E[g(X)] = \sum_{x_k \in R_X} g(x_k)P_X(x_k) \end{equation}

This can be proved by expressing E[Y]=E[g(X)]=yRYyPY(y)E[Y] = E[g(X)] = \sum_{y \in R_Y} y P_Y(y) in terms of PX(x)P_X(x). Typically, using LOTUS is easier than the direct definition when we need E[g(X)]E[g(X)].

2 Transformations of Random Variables

For a random variable YY, whether discrete or continuous, and a function g:RRg: \mathbb{R} \to \mathbb{R}, W=g(Y)W = g(Y) is also a random variable. Its distribution (pdf), mean, variance, etc., will differ from YY's. Transformations of random variables are crucial in statistics.

Theorem:

Suppose YY is a random variable, gg is a transformation, and W=g(Y)W = g(Y). Then:

  1. If YY is discrete, with pmf pYp_Y, we have:

E[W]=ySYg(y)pY(y) \begin{equation} E[W] = \sum_{y \in S_Y} g(y) p_Y(y) \end{equation}

  1. If YY is continuous, with pdf fYf_Y, we have:

E[W]=g(y)fY(y)dy \begin{equation} E[W] = \int_{-\infty}^{\infty} g(y) f_Y(y) \, dy \end{equation}

The cdf-method

The fundamental formula of this theorem helps compute expectations, but it doesn't provide the distribution of W=g(Y)W = g(Y). To find the cdf FWF_W of WW, given the cdf FYF_Y of YY, we can write:

FW(w)=P[Ww]=P[g(Y)w] \begin{equation} F_W(w) = P[W \leq w] = P[g(Y) \leq w] \end{equation}

The probability on the right needs to be expressed in terms of YY. If gg is strictly increasing, it admits an inverse function g1g^{-1} and we can write:

FW(w)=P[g(Y)w]=P[Yg1(w)]=FY(g1(w)) \begin{equation} F_W(w) = P[g(Y) \leq w] = P[Y \leq g^{-1}(w)] = F_Y(g^{-1}(w)) \end{equation}

For strictly decreasing gg:

P[g(Y)w]=P[Yg1(w)] \begin{equation} P[g(Y) \leq w] = P[Y \geq g^{-1}(w)] \end{equation}

In continuous cases, P[Yy]=1FY(y)P[Y \geq y] = 1 - F_Y(y), so:

FW(w)=P[g(Y)w]=P[Yg1(w)]=1FY(g1(w)) \begin{equation} F_W(w) = P[g(Y) \leq w] = P[Y \geq g^{-1}(w)] = 1 - F_Y(g^{-1}(w)) \end{equation}

3 Functions of Two Random Variables

For two discrete random variables XX and YY, and Z=g(X,Y)Z = g(X, Y), we can determine the PMF of ZZ as:

PZ(z)=P(g(X,Y)=z)=(xi,yj)AzPXY(xi,yj),where Az={(xi,yj)RXY:g(xi,yj)=z} \begin{equation} P_{Z}(z) = P(g(X, Y) = z) = \sum_{(x_i, y_j) \in A_z} P_{XY}(x_i, y_j), \quad \text{where } A_z = \{(x_i, y_j) \in R_{XY} : g(x_i, y_j) = z\} \end{equation}

For E[g(X,Y)]E[g(X, Y)], we can use LOTUS:

LOTUS for two discrete random variables:

E[g(X,Y)]=(xi,yj)RXYg(xi,yj)PXY(xi,yj) \begin{equation} E[g(X, Y)] = \sum_{(x_i, y_j) \in R_{XY}} g(x_i, y_j) P_{XY}(x_i, y_j) \end{equation}

Linearity of Expectation: For two discrete random variables XX and YY, E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y].

Let g(X,Y)=X+Yg(X, Y) = X + Y. Using LOTUS, we have:

E[X+Y]=(xi,yj)RXY(xi+yj)PXY(xi,yj) \begin{equation} E[X + Y] = \sum_{(x_i, y_j) \in R_{XY}} (x_i + y_j) P_{XY}(x_i, y_j) \end{equation}

=(xi,yj)RXYxiPXY(xi,yj)+(xi,yj)RXYyjPXY(xi,yj) \begin{equation} = \sum_{(x_i, y_j) \in R_{XY}} x_i P_{XY}(x_i, y_j) + \sum_{(x_i, y_j) \in R_{XY}} y_j P_{XY}(x_i, y_j) \end{equation}

=xiRXyjRYxiPXY(xi,yj)+xiRXyjRYyjPXY(xi,yj) \begin{equation} = \sum_{x_i \in R_X} \sum_{y_j \in R_Y} x_i P_{XY}(x_i, y_j) + \sum_{x_i \in R_X} \sum_{y_j \in R_Y} y_j P_{XY}(x_i, y_j) \end{equation}

=xiRXxiyjRYPXY(xi,yj)+yjRYyjxiRXPXY(xi,yj)=xiRXxiPX(xi)+yjRYyjPY(yj)(marginal PMF) \begin{equation} = \sum_{x_i \in R_X} x_i \sum_{y_j \in R_Y} P_{XY}(x_i, y_j) + \sum_{y_j \in R_Y} y_j \sum_{x_i \in R_X} P_{XY}(x_i, y_j) = \sum_{x_i \in R_X} x_i P_X(x_i) + \sum_{y_j \in R_Y} y_j P_Y(y_j) \quad \text{(marginal PMF)} \end{equation}

=E[X]+E[Y] \begin{equation} = E[X] + E[Y] \end{equation}

Functions of Two Continuous Random Variables

For two continuous random variables g(X,Y)g(X, Y), the concepts are similar. For E[g(X,Y)]E[g(X, Y)], we use LOTUS:

LOTUS for two continuous random variables:

E[g(X,Y)]=g(x,y)fXY(x,y)dxdy \begin{equation} E[g(X, Y)] = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} g(x, y) f_{XY}(x, y) \, dx \, dy \end{equation}

If Z=g(X,Y)Z = g(X, Y) and we are interested in its distribution, we can start by writing:

FZ(z)=P(Zz)=P(g(X,Y)z)=DfXY(x,y)dxdy \begin{equation} F_Z(z) = P(Z \leq z) = P(g(X, Y) \leq z) = \iint\limits_D f_{XY}(x, y) \, dx \, dy \end{equation}

where D={(x,y)g(x,y)z}D = \{(x, y) | g(x, y) \leq z\}. To find the PDF of ZZ, we differentiate FZ(z)F_Z(z).

4 Different Views of a Function of a Random Variable (FRV)

There are several different but essentially equivalent views of a function of a random variable (FRV). We will present two of them, highlighting their differences in emphasis.

Assume we have an underlying probability space P=(Ω,F,P)P = (\Omega, F, P) and a random variable XX defined on it. Recall that XX is a rule that assigns a number X(ζ)X(\zeta) to every ζΩ\zeta \in \Omega. XX transforms the σ\sigma-field of events FF into the Borel σ\sigma-field BB of sets of numbers on the real line. If RXR_X denotes the subset of the real line reached by XX as ζ\zeta ranges over Ω\Omega, we can regard XX as an ordinary function with domain Ω\Omega and range RXR_X. Now, consider a measurable real function g(x)g(x) of the real variable xx.

i) First View (Y: Ω → RY)

For every ζΩ\zeta \in \Omega, we generate a number g(X(ζ))=Y(ζ)g(X(\zeta)) = Y(\zeta). The rule YY, which generates the numbers {Y(ζ)}\{Y(\zeta)\} for random outcomes {ζΩ}\{\zeta \in \Omega\}, is an RV with domain Ω\Omega and range RYRR_Y \subset \mathbb{R}. For every Borel set of real numbers BYB_Y, the set {ζ:Y(ζ)BY}\{\zeta : Y(\zeta) \in B_Y\} is an event. Specifically, the event {ζ:Y(ζ)y}\{\zeta : Y(\zeta) \leq y\} is equal to the event {ζ:g(X(ζ))y}\{\zeta : g(X(\zeta)) \leq y\}.

In this view, the emphasis is on YY as a mapping from Ω\Omega to RYR_Y, with the intermediate role of XX being suppressed.

ii) Second View (Input/Output Systems View)

For every value of X(ζ)X(\zeta) in the range RXR_X, we generate a new number Y=g(X)Y = g(X) whose range is RYR_Y. The rule YY, whose domain is RXR_X and range is RYR_Y, is a function of the random variable XX. Here, the focus is on viewing YY as a mapping from one set of real numbers to another. A model for this view is to regard XX as the input to a system with transformation function g()g(\cdot). For such a system, an input xx gets transformed to an output y=g(x)y = g(x), and an input function XX gets transformed to an output function Y=g(X)Y = g(X).

In general, we will write {Yy}={XCy}\{Y \leq y\} = \{X \in C_y\} in the sequel. For CyC_y so determined, it follows that:

P[Yy]=P[XCy] \begin{equation} P[Y \leq y] = P[X \in C_y] \end{equation}

If CyC_y is empty, then the probability of {Yy}\{Y \leq y\} is zero.

Input–Output Model

When dealing with the input–output model, it is convenient to omit references to an abstract underlying experiment and deal directly with the RVs XX and YY. In this approach, the observations on XX are the underlying experiments, events are Borel subsets of the real line R\mathbb{R}, and the set function P[]P[\cdot] is replaced by the distribution function FX()F_X(\cdot). Then YY is a mapping (an RV) whose domain is the range RXR_X of XX, and whose range RYR_Y is a subset of R\mathbb{R}. The functional properties of XX are ignored in favor of viewing XX as a mechanism that gives rise to numerically valued random phenomena. In this view, the domain of XX is irrelevant.

Additional discussion on the various views of an FRV is available in the literature.