Discrete and Continuous Random Variables

Random Variables

There are two important classes of random variables: discrete random variables and continuous random variables.

Discrete Random Variables

A set A A is countable if either:

  • A A is finite, e.g., {1,2,3,4} \{1, 2, 3, 4\} , or
  • It can be put in one-to-one correspondence with the natural numbers (countably infinite).

Sets such as N \mathbb{N} , Z \mathbb{Z} , Q \mathbb{Q} and their subsets are countable, while nonempty intervals [a,b] [a, b] in R \mathbb{R} are uncountable. A random variable is discrete if its range is a countable set. If X X is a discrete random variable, its range RX R_X is countable, so we can list its elements: RX={x1,x2,x3,} R_X = \{x_1, x_2, x_3, \ldots\} Here, x1,x2,x3, x_1, x_2, x_3, \ldots are possible values of X X . The event A={X=xk} A = \{X = x_k\} is defined as the set of outcomes s s in the sample space S S for which X(s)=xk X(s) = x_k : A={sSX(s)=xk} A = \{s \in S \mid X(s) = x_k\}

The probabilities of events {X=xk} \{X = x_k\} are given by the probability mass function (PMF) of X X .

Definition (PMF)

Let X X be a discrete random variable with range RX={x1,x2,x3,} R_X = \{x_1, x_2, x_3, \ldots\} (finite or countably infinite). The function PX(xk)=P(X=xk),for k=1,2,3,, P_X(x_k) = P(X = x_k), \quad \text{for } k = 1, 2, 3, \ldots, is called the probability mass function (PMF) of X X .

The PMF can be extended to all real numbers: PX(x)={P(X=x)if xRX0otherwise P_X(x) = \begin{cases} P(X = x) & \text{if } x \in R_X \\ 0 & \text{otherwise} \end{cases}

The PMF is a probability measure that satisfies:

  • 0PX(x)1 0 \leq P_X(x) \leq 1 for all x x ,
  • xRXPX(x)=1 \sum_{x \in R_X} P_X(x) = 1 ,
  • For any set ARX A \subset R_X , P(XA)=xAPX(x) P(X \in A) = \sum_{x \in A} P_X(x) .

Independence of Random Variables

Two random variables X X and Y Y are independent if: P(X=x,Y=y)=P(X=x)P(Y=y)for all x,y. P(X = x, Y = y) = P(X = x) P(Y = y) \quad \text{for all } x, y.

For n n discrete random variables X1,X2,,Xn X_1, X_2, \ldots, X_n , they are independent if: P(X1=x1,X2=x2,,Xn=xn)=P(X1=x1)P(X2=x2)P(Xn=xn)for all x1,x2,,xn. P(X_1 = x_1, X_2 = x_2, \ldots, X_n = x_n) = P(X_1 = x_1) P(X_2 = x_2) \ldots P(X_n = x_n) \quad \text{for all } x_1, x_2, \ldots, x_n.

Types of Discrete Random Variables

Bernoulli Distribution

A Bernoulli random variable can take two values, usually 0 and 1, modeling a success/failure experiment.

Definition (Bernoulli Distribution)

A random variable X X is Bernoulli with parameter p p , denoted XBernoulli(p) X \sim \text{Bernoulli}(p) , if: PX(x)={pfor x=11pfor x=00otherwise P_X(x) = \begin{cases} p & \text{for } x = 1 \\ 1 - p & \text{for } x = 0 \\ 0 & \text{otherwise} \end{cases}

Geometric Distribution

This models the number of trials until the first success in a series of independent Bernoulli trials.

Definition (Geometric Distribution)

A random variable X X is geometric with parameter p p , denoted XGeometric(p) X \sim \text{Geometric}(p) , if: PX(k)={p(1p)k1for k=1,2,3,0otherwise P_X(k) = \begin{cases} p(1-p)^{k-1} & \text{for } k = 1, 2, 3, \ldots \\ 0 & \text{otherwise} \end{cases}

Binomial Distribution

Models the number of successes in n n independent Bernoulli trials.

Definition (Binomial Distribution)

A random variable X X is binomial with parameters n n and p p , denoted XBinomial(n,p) X \sim \text{Binomial}(n, p) , if: PX(k)={(nk)pk(1p)nkfor k=0,1,2,,n0otherwise P_X(k) = \begin{cases} {n \choose k} p^k (1-p)^{n-k} & \text{for } k = 0, 1, 2, \ldots, n \\ 0 & \text{otherwise} \end{cases}

Poisson Distribution

Models the number of events in a fixed interval of time or space.

Definition (Poisson Distribution)

A random variable X X is Poisson with parameter λ \lambda , denoted XPoisson(λ) X \sim \text{Poisson}(\lambda) , if: PX(k)={eλλkk!for k{0,1,2,}0otherwise P_X(k) = \begin{cases} \frac{e^{-\lambda} \lambda^k}{k!} & \text{for } k \in \{0, 1, 2, \ldots\} \\ 0 & \text{otherwise} \end{cases}

Cumulative Distribution Function (CDF)

The CDF of a random variable X X is defined as: FX(x)=P(Xx)for all xR. F_X(x) = P(X \leq x) \quad \text{for all } x \in \mathbb{R}.

For a discrete random variable X X with range RX={x1,x2,x3,} R_X = \{x_1, x_2, x_3, \ldots\} (with x1<x2<x3< x_1 < x_2 < x_3 < \ldots ): FX(x)=xkxPX(xk). F_X(x) = \sum_{x_k \leq x} P_X(x_k).

For all ab a \leq b : P(a<Xb)=FX(b)FX(a) P(a < X \leq b) = F_X(b) - F_X(a)

Expected Value (Mean)

The expected value of a discrete random variable X X with range RX={x1,x2,x3,} R_X = \{x_1, x_2, x_3, \ldots\} is: E[X]=xkRXxkPX(xk). E[X] = \sum_{x_k \in R_X} x_k P_X(x_k).

Linearity of Expectation

  • E[aX+b]=aE[X]+b E[aX + b] = aE[X] + b
  • E[X1+X2++Xn]=E[X1]+E[X2]++E[Xn] E[X_1 + X_2 + \ldots + X_n] = E[X_1] + E[X_2] + \ldots + E[X_n]

Expected Value of a Function (LOTUS)

For a function g(X) g(X) : E[g(X)]=xkRXg(xk)PX(xk) E[g(X)] = \sum_{x_k \in R_X} g(x_k) P_X(x_k)

Variance

Variance measures the spread of a random variable around its mean. For EX=μX EX = \mu_X : Var(X)=E[(XμX)2]=xkRX(xkμX)2PX(xk) \text{Var}(X) = E[(X - \mu_X)^2] = \sum_{x_k \in R_X} (x_k - \mu_X)^2 P_X(x_k)

Standard Deviation

SD(X)=σX=Var(X) \text{SD}(X) = \sigma_X = \sqrt{\text{Var}(X)}

Computational Formula for Variance

Var(X)=E[X2](E[X])2 \text{Var}(X) = E[X^2] - (E[X])^2

Variance of a Linear Transformation

For a,bR a, b \in \mathbb{R} : Var(aX+b)=a2Var(X) \text{Var}(aX + b) = a^2 \text{Var}(X)

Variance of the Sum of Independent Variables

For independent X1,X2,,Xn X_1, X_2, \ldots, X_n : Var(X)=Var(X1)+Var(X2)++Var(Xn) \text{Var}(X) = \text{Var}(X_1) + \text{Var}(X_2) + \ldots + \text{Var}(X_n)

Continuous Random Variables

Random variables with a continuous range of possible values are common. For example, the exact velocity of a vehicle on a highway is a continuous random variable. The CDF of a continuous random variable is a continuous function, meaning it does not have jumps. This aligns with the fact that P(X=x)=0 P(X = x) = 0 for all x x .

Definition (CDF)

A random variable X X with CDF FX(x) F_X(x) is continuous if FX(x) F_X(x) is a continuous function for all xR x \in \mathbb{R} . We also assume that the CDF is differentiable almost everywhere in R \mathbb{R} .

Probability Density Function (PDF)

For continuous random variables, the PMF does not apply as P(X=x)=0 P(X = x) = 0 for all xR x \in \mathbb{R} . Instead, we use the PDF, which gives the density of probability at a point.

fX(x)=limΔ0+P(x<Xx+Δ)Δ f_X(x) = \lim_{\Delta \rightarrow 0^+} \frac{P(x < X \leq x + \Delta)}{\Delta}

The function fX(x) f_X(x) gives the probability density at point x x . It is defined as:

fX(x)=dFX(x)dx=FX(x),if FX(x) is differentiable at x f_X(x) = \frac{dF_X(x)}{dx} = F'_X(x), \quad \text{if } F_X(x) \text{ is differentiable at } x

A random variable X X is continuous if there is a non-negative function fX f_X , called the probability density function (PDF), such that:

P(XB)=BfX(x)dx \mathbb{P}(X \in B) = \int_B f_X(x) \, dx

For every subset B B of the real line, the probability of X X falling within an interval [a,b] [a, b] is:

P(aXb)=abfX(x)dx \mathbb{P}(a \leq X \leq b) = \int_a^b f_X(x) \, dx

This can be interpreted as the area under the graph of the PDF. For any single value a a :

P(X=a)=0 \mathbb{P}(X = a) = 0

Thus:

P(aXb)=P(a<X<b)=P(aX<b)=P(a<Xb) P(a \leq X \leq b) = P(a < X < b) = P(a \leq X < b) = P(a < X \leq b)

A function fX f_X must be non-negative and satisfy:

fXdx=1 \int_{-\infty}^{\infty} f_X \, dx = 1

Definition (PDF)

Consider a continuous random variable X X with an absolutely continuous CDF FX(x) F_X(x) . The function fX(x) f_X(x) defined by:

fX(x)=dFX(x)dx=FX(x),if FX(x) is differentiable at x f_X(x) = \frac{dF_X(x)}{dx} = F'_X(x), \quad \text{if } F_X(x) \text{ is differentiable at } x

is the probability density function (PDF) of X X . For small values of δ \delta :

P(x<Xx+δ)fX(x)δ P(x < X \leq x + \delta) \approx f_X(x) \delta

If fX(x1)>fX(x2) f_X(x_1) > f_X(x_2) :

P(x1<Xx1+δ)>P(x2<Xx2+δ) P(x_1 < X \leq x_1 + \delta) > P(x_2 < X \leq x_2 + \delta)

Thus, X X is more likely to be around x1 x_1 than x2 x_2 .

The CDF can be obtained from the PDF by integration:

FX(x)=xfX(u)du F_X(x) = \int_{-\infty}^{x} f_X(u) \, du

And:

P(a<Xb)=FX(b)FX(a)=abfX(u)du P(a < X \leq b) = F_X(b) - F_X(a) = \int_{a}^{b} f_X(u) \, du

Properties of the PDF

  • fX(x)0 f_X(x) \geq 0 for all xR x \in \mathbb{R}
  • fX(u)du=1 \int_{-\infty}^{\infty} f_X(u) \, du = 1
  • P(a<Xb)=abfX(u)du P(a < X \leq b) = \int_{a}^{b} f_X(u) \, du
  • For any set A A , P(XA)=AfX(u)du P(X \in A) = \int_A f_X(u) \, du

Range of a Continuous Random Variable

The range RX R_X of a continuous random variable X X is:

RX={xfX(x)>0} R_X = \{ x \mid f_X(x) > 0 \}

Expected Value

The expected value of a continuous random variable X X is:

EX=xfX(x)dx EX = \int_{-\infty}^{\infty} x f_X(x) \, dx

Expected Value of a Function (LOTUS)

For a function g(X) g(X) :

E[g(X)]=g(x)fX(x)dx E[g(X)] = \int_{-\infty}^{\infty} g(x) f_X(x) \, dx

Linearity of Expectation

  • E[aX+b]=aEX+b E[aX + b] = aEX + b
  • E[X1+X2++Xn]=EX1+EX2++EXn E[X_1 + X_2 + \cdots + X_n] = EX_1 + EX_2 + \cdots + EX_n

Variance

The variance of a continuous random variable X X is:

Var(X)=E[(XμX)2]=EX2(EX)2 \textrm{Var}(X) = E[(X - \mu_X)^2] = EX^2 - (EX)^2

So:

Var(X)=(xμX)2fX(x)dx=EX2(EX)2 \textrm{Var}(X) = \int_{-\infty}^{\infty} (x - \mu_X)^2 f_X(x) \, dx = EX^2 - (EX)^2

For a,bR a, b \in \mathbb{R} :

Var(aX+b)=a2Var(X) \textrm{Var}(aX + b) = a^2 \textrm{Var}(X)

If X X is continuous and Y=g(X) Y = g(X) , then Y Y is also a random variable. To find the CDF and PDF of Y Y , start from the CDF and then differentiate.

Uniform Random Variable

A continuous random variable X X is uniformly distributed over [a,b] [a, b] , denoted XUniform(a,b) X \sim \text{Uniform}(a, b) , if:

fX(x)={1baa<x<b0otherwise f_X(x) = \begin{cases} \frac{1}{b-a} & a < x < b \\ 0 & \text{otherwise} \end{cases}

The CDF and mean are:

FX(x)={0x<axabaax<b1xb F_X(x) = \begin{cases} 0 & x < a \\ \frac{x - a}{b - a} & a \leq x < b \\ 1 & x \geq b \end{cases}

EX=a+b2 EX = \frac{a + b}{2}

The variance is:

Var(X)=(ba)212 \textrm{Var}(X) = \frac{(b - a)^2}{12}

Exponential Random Variable

The exponential distribution models the time between events. A continuous random variable X X is exponentially distributed with parameter λ>0 \lambda > 0 , denoted XExponential(λ) X \sim \text{Exponential}(\lambda) , if:

fX(x)={λeλxx>00otherwise f_X(x) = \begin{cases} \lambda e^{-\lambda x} & x > 0 \\ 0 & \text{otherwise} \end{cases}

The CDF, mean, and variance are:

FX(x)=1eλx F_X(x) = 1 - e^{-\lambda x}

EX=1λ EX = \frac{1}{\lambda}

Var(X)=1λ2 \textrm{Var}(X) = \frac{1}{\lambda^2}

The exponential distribution is memoryless:

P(X>x+aX>a)=P(X>x) P(X > x + a \mid X > a) = P(X > x)

Normal Distribution

The Central Limit Theorem (CLT) states that the sum of a large number of random variables is approximately normal. A standard normal random variable Z Z is denoted ZN(0,1) Z \sim N(0, 1) and has PDF:

fZ(z)=12πexp{z22} f_Z(z) = \frac{1}{\sqrt{2\pi}} \exp\left\{ -\frac{z^2}{2} \right\}

The mean and variance are:

EZ=0 EZ = 0 Var(Z)=1 \textrm{Var}(Z) = 1

The CDF is:

FZ(z)=12πzexp{u22}du F_Z(z) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{z} \exp\left\{ -\frac{u^2}{2} \right\} \, du

The CDF of any normal random variable can be written in terms of the standard normal CDF, denoted Φ \Phi :

Φ(x)=12πxexp{u22}du \Phi(x) = \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{x} \exp\left\{ -\frac{u^2}{2} \right\} \, du

Properties of Φ \Phi :

  • limxΦ(x)=1 \lim_{x \rightarrow \infty} \Phi(x) = 1
  • limxΦ(x)=0 \lim_{x \rightarrow -\infty} \Phi(x) = 0
  • Φ(0)=12 \Phi(0) = \frac{1}{2}
  • Φ(x)=1Φ(x) \Phi(-x) = 1 - \Phi(x)

A normal random variable X X with mean μ \mu and variance σ2 \sigma^2 is denoted XN(μ,σ2) X \sim N(\mu, \sigma^2) . If Z Z is standard normal and X=σZ+μ X = \sigma Z + \mu :

XN(μ,σ2) X \sim N(\mu, \sigma^2)

The CDF and PDF of X X are:

FX(x)=Φ(xμσ) F_X(x) = \Phi\left( \frac{x - \mu}{\sigma} \right)

fX(x)=1σ2πexp{(xμ)22σ2} f_X(x) = \frac{1}{\sigma \sqrt{2 \pi}} \exp\left\{ -\frac{(x - \mu)^2}{2\sigma^2} \right\}

For a linear transformation Y=aX+b Y = aX + b :

YN(aμX+b,a2σX2) Y \sim N(a\mu_X + b, a^2 \sigma_X^2)