If X is a random variable and Y=g(X), then Y itself is a random variable. Consequently, we can discuss its PMF, CDF, and expected value. The range of Y can be written as:
RY={g(x)∣x∈RX}where Rx is the range of X
To find the PMF of Y=g(X) given the PMF of X, we can write:
PY(y)=P(Y=y)=P(g(X)=y)=x:g(x)=y∑PX(x)
Let's look at an example.
Example
Let X be a discrete random variable with PX(k)=51 for k=−1,0,1,2,3. Let Y=2∣X∣. Determine the range and PMF of Y.
Solution
First, note that the range of Y is:
RY={2∣x∣ where x∈RX}={0,2,4,6}
To find PY(y), we need to determine P(Y=y) for y=0,2,4,6. We have:
PY(0)=P(Y=0)=P(2∣X∣=0)=P(X=0)=51
PY(2)=P(Y=2)=P(2∣X∣=2)=P(X=−1 or X=1)=PX(−1)+PX(1)=51+51=52
PY(4)=P(Y=4)=P(2∣X∣=4)=P(X=2)=51
PY(6)=P(Y=6)=P(2∣X∣=6)=P(X=3)=51
So, in summary,
PY(k)=⎩⎨⎧51520for k=0,4,6for k=2otherwise
Expected Value of a Function of a Random Variable (LOTUS)
Let X be a discrete random variable with PMF PX(x), and let Y=g(X). Suppose we want to find E[Y]. One approach is to first find the PMF of Y and then use the expectation formula E[Y]=E[g(X)]=∑y∈RYyPY(y). However, a more convenient method is the law of the unconscious statistician (LOTUS).
Law of the Unconscious Statistician (LOTUS) for Discrete Random Variables:
E[g(X)]=xk∈RX∑g(xk)PX(xk)
This can be proved by expressing E[Y]=E[g(X)]=∑y∈RYyPY(y) in terms of PX(x). Typically, using LOTUS is easier than the direct definition when we need E[g(X)].
Let's prove E[aX+b]=aE[X]+b (linearity of expectation), where g(X)=aX+b:
For a random variable Y, whether discrete or continuous, and a function g:R→R, W=g(Y) is also a random variable. Its distribution (pdf), mean, variance, etc., will differ from Y's. Transformations of random variables are crucial in statistics.
Theorem 4.1.1
Suppose Y is a random variable, g is a transformation, and W=g(Y). Then:
If Y is discrete, with pmf pY, we have:
E[W]=y∈SY∑g(y)pY(y)
If Y is continuous, with pdf fY, we have:
E[W]=∫−∞∞g(y)fY(y)dy
The cdf-method
The fundamental formula of Theorem 4.1.1 helps compute expectations, but it doesn't provide the distribution of W=g(Y). To find the cdf FW of W, given the cdf FY of Y, we can write:
FW(w)=P[W≤w]=P[g(Y)≤w]
The probability on the right needs to be expressed in terms of Y. If g is strictly increasing, it admits an inverse function g−1 and we can write:
FW(w)=P[g(Y)≤w]=P[Y≤g−1(w)]=FY(g−1(w))
For strictly decreasing g:
P[g(Y)≤w]=P[Y≥g−1(w)]
In continuous cases, P[Y≥y]=1−FY(y), so:
FW(w)=P[g(Y)≤w]=P[Y≥g−1(w)]=1−FY(g−1(w))
Functions of Two Random Variables
For two discrete random variables X and Y, and Z=g(X,Y), we can determine the PMF of Z as:
For two continuous random variables g(X,Y), the concepts are similar. For E[g(X,Y)], we use LOTUS:
LOTUS for two continuous random variables:
E[g(X,Y)]=∫−∞∞∫−∞∞g(x,y)fXY(x,y)dxdy
If Z=g(X,Y) and we are interested in its distribution, we can start by writing:
FZ(z)=P(Z≤z)=P(g(X,Y)≤z)=D∬fXY(x,y)dxdy
where D={(x,y)∣g(x,y)≤z}. To find the PDF of Z, we differentiate FZ(z).
The Method of Transformations
When we have functions of two or more jointly continuous random variables, we may use a method similar to the previous theorems to find the resulting PDFs. Here's the theorem:
Theorem
Let X and Y be two jointly continuous random variables. Let (Z,W)=g(X,Y)=(g1(X,Y),g2(X,Y)), where g:R2↦R2 is a continuous one-to-one (invertible) function with continuous partial derivatives. Let h=g−1, i.e., (X,Y)=h(Z,W)=(h1(Z,W),h2(Z,W)). Then Z and W are jointly continuous and their joint PDF, fZW(z,w), for (z,w)∈RZW is given by:
Moment generating functions (MGFs) are useful for several reasons, particularly for analyzing sums of random variables. Before discussing MGFs, let's define moments.
Definition: The n-th moment of a random variable X is defined as E[Xn]. The n-th central moment of X is defined as E[(X−EX)n].
For example, the first moment is the expected value E[X]. The second central moment is the variance of X. Other moments provide additional useful information about random variables.
The moment generating function (MGF) of a random variable X is a function MX(s) defined as:
MX(s)=E[esX]
The MGF of X exists if there is a positive constant a such that MX(s) is finite for all s∈[−a,a].
MGFs are useful for two main reasons. First, the MGF of X provides all moments of X, hence its name. Second, the MGF (if it exists) uniquely determines the distribution. That is, if two random variables have the same MGF, they must have the same distribution. This method is particularly useful when working on sums of independent random variables.
Finding Moments from MGF
Remember the Taylor series for ex: for all x∈R, we have:
ex=1+x+2!x2+3!x3+…=k=0∑∞k!xk
Now, we can write:
esX=k=0∑∞k!(sX)k=k=0∑∞k!Xksk
Thus, we have:
MX(s)=E[esX]=k=0∑∞E[Xk]k!sk
We conclude that the kth moment of X is the coefficient of k!sk in the Taylor series of MX(s). Thus, if we have the Taylor series of MX(s), we can obtain all moments of X.
Theorem
Consider two random variables X and Y. Suppose there exists a positive constant c such that the MGFs of X and Y are finite and identical for all values of s in [−c,c]. Then,
FX(t)=FY(t), for all t∈R
Sum of Independent Random Variables
Suppose X1,X2,…,Xn are n independent random variables, and the random variable Y is defined as:
=E[esX1]E[esX2]⋯E[esXn](since the Xi’s are independent)
=MX1(s)MX2(s)⋯MXn(s)
Different Views of a Function of a Random Variable (FRV)
There are several different but essentially equivalent views of a function of a random variable (FRV). We will present two of them, highlighting their differences in emphasis.
Assume we have an underlying probability space P=(Ω,F,P) and a random variable X defined on it. Recall that X is a rule that assigns a number X(ζ) to every ζ∈Ω. X transforms the σ-field of events F into the Borel σ-field B of sets of numbers on the real line. If RX denotes the subset of the real line reached by X as ζ ranges over Ω, we can regard X as an ordinary function with domain Ω and range RX. Now, consider a measurable real function g(x) of the real variable x.
First View (Y: Ω → RY)
For every ζ∈Ω, we generate a number g(X(ζ))=Y(ζ). The rule Y, which generates the numbers {Y(ζ)} for random outcomes {ζ∈Ω}, is an RV with domain Ω and range RY⊂R. For every Borel set of real numbers BY, the set {ζ:Y(ζ)∈BY} is an event. Specifically, the event {ζ:Y(ζ)≤y} is equal to the event {ζ:g(X(ζ))≤y}.
In this view, the emphasis is on Y as a mapping from Ω to RY, with the intermediate role of X being suppressed.
Second View (Input/Output Systems View)
For every value of X(ζ) in the range RX, we generate a new number Y=g(X) whose range is RY. The rule Y, whose domain is RX and range is RY, is a function of the random variable X. Here, the focus is on viewing Y as a mapping from one set of real numbers to another. A model for this view is to regard X as the input to a system with transformation function g(⋅). For such a system, an input x gets transformed to an output y=g(x), and an input function X gets transformed to an output function Y=g(X).
In general, we will write {Y≤y}={X∈Cy} in the sequel. For Cy so determined, it follows that:
P[Y≤y]=P[X∈Cy]
If Cy is empty, then the probability of {Y≤y} is zero.
Input–Output Model
When dealing with the input–output model, it is convenient to omit references to an abstract underlying experiment and deal directly with the RVs X and Y. In this approach, the observations on X are the underlying experiments, events are Borel subsets of the real line R, and the set function P[⋅] is replaced by the distribution function FX(⋅). Then Y is a mapping (an RV) whose domain is the range RX of X, and whose range RY is a subset of R. The functional properties of X are ignored in favor of viewing X as a mechanism that gives rise to numerically valued random phenomena. In this view, the domain of X is irrelevant.
Additional discussion on the various views of an FRV is available in the literature.
Solving Problems of the Type Y=g(X)
Since the events {X<ay−b} and {X≥ay−b} are disjoint and their union is the certain event, we obtain from Axiom 3:
P[X<ay−b]+P[X≥ay−b]=1
For a continuous RV:
P[X<ay−b]=P[X≤ay−b]andP[X≥ay−b]=P[X>ay−b]
Thus, for a<0:
FY(y)=1−FX(ay−b)
and
fY(y)=∣a∣1fX(ay−b),a=0
When X is not necessarily continuous, we modify the development for a<0 because it may no longer be true that P[X<ay−b]=P[X≤ay−b] due to the possibility that the event {X=ay−b} has a positive probability. The modified statement becomes P[X<ay−b]=P[X≤ay−b]−P[X=ay−b]=FX(ay−b)−PX(ay−b).
Solving Problems of the Type Z=g(X,Y)
In many science and engineering problems, a random variable Z is functionally related to two (or more) random variables X and Y. For example:
The signal Z at the input of an amplifier consists of a signal X to which independent random noise Y is added. Thus, Z=X+Y. If X is also an RV, what is the pdf of Z?
Problems of the type Z=g(X,Y) are similar to those of Y=g(X). For Y=g(X), the basic problem was to find the point set Cy such that the events {ζ:Y(ζ)≤y} and {ζ:X(ζ)∈Cy} were equal. The same applies here: find the point set Cz in the (x,y) plane such that the events {ζ:Z(ζ)≤z} and {ζ:X(ζ),Y(ζ)∈Cz} are equal, indicated by:
{Z≤z}={(X,Y)∈Cz}
and
FZ(z)=∫∫(x,y)∈CzfXY(x,y)dxdy
The point set Cz is determined from the functional relation g(x,y)≤z. Problems of the type Z=g(X,Y) involve joint densities or distributions and double integrals (or summations) instead of single ones. Hence, computing fZ(z) is generally more complex than computing fY(y) in Y=g(X). However, we can use two labor-saving methods:
Solve many Z=g(X,Y)-type problems using a "turn-the-crank" formula, an extension of Equation 3.2-23, through the use of auxiliary variables (Section 3.4).
Solve problems of the type Z=X+Y using characteristic functions (Chapter 4).
Sum of Two Independent Random Variables
The situation modeled by Z=X+Y (and its extension Z=∑i=1NXi) occurs frequently in engineering and science. Computing fZ(z) is perhaps the most important problem of the type Z=g(X,Y). We must find the set of points Cz such that the event {Z≤z} is equal to {X+Y≤z}, and thus to {(X,Y)∈Cz}. The set of points Cz represents the shaded region to the left of the line x+y≤z.
Using Equation 3.3-2, specialized for this case, we obtain: