Maximum Likelihood Decoding of Linear Codes on Binary-Input Memoryless Channels

Theory

As described in the theory of the previous experiment, a memoryless channel is described by the input alphabet X\cal X (set of possible values taken by the input random variable XX), the output alphabet Y\cal Y (possible values taken by the output random variable YY), and the transition probabilities pYX(yx),xX,yYp_{Y|X}(y|x), \forall x\in{\cal X}, y\in{\cal Y}.

Consider that we use the channel nn times, i.e, our input sequence is a vector x=(x1,,xn)Xn\bm{x}=(x_1,\ldots,x_n)\in {\cal X}^n (recall that Xn{\cal X}^n denotes the set of all nn-length tuples with entries from X{\cal X}). Because of the noisy characteristics of the channel (governed by the transition probabilities pYXp_{Y|X}), the transmitted input vector x\bm{x} is transformed, into a random output vector Y=(Y1,,Yn)\bm{Y}=(Y_1,\ldots,Y_n), governed by the transition probabilites pYX(yx)=i=1np(yixi)p_{\bm{Y}|\bm{X}}(\bm{y}|\bm{x})=\prod_{i=1}^np(y_i|x_i).

The receiver observes the output sequence Y=y\bm{Y}=\bm{y}. The goal of the receiver is then to decode the transmitted vector x\bm{x}. The estimate of the transmitted vector is denoted by x^\hat{\bm{x}}. However, because of the fact that the channel noise is random, there could be several possible vectors in Xn{\cal X}^n which result in the same received vector y\bm{y}. Therefore, the decoder at the receiver has to have some metric, based on which it can decide the value for the estimate x^\hat{\bm{x}}.

Specifically, we consider the likelihood as the decoding metric, in this virtual lab. That is, the decoder seeks to find that x\bm{x} which maximizes the probability p(yx)p(\bm{y}|\bm{x}). x^=argmaxp(yx).\hat{\bm{x}}=arg\max p(\bm{y}|\bm{x}). The probability p(yx)p(\bm{y}|\bm{x}) is called the likelihood of x\bm{x} with respect to the received vector y\bm{y}.

However, the problem that there could be several candidates for such xXn\bm{x}\in{\cal X}^n which maximize the likelihood. This problem can be resolved by choosing the transmit sequences from a code. Specifically, we are interested in choosing nn-length linear codes for binary input channels.

Using Linear Codes on Memoryless Binary-Input Channels

As defined before, an nn-length linear code C\cal C over F2\mathbb{F}_2 of rate kn\frac{k}{n} is a kk-dimensional subspace of F2n\mathbb{F}_2^n.


The ML Decoding Rule for Linear Codes

Assuming that the receiver receives the vector y\bm{y}, the Maximum Likelihood decoding rule (called the ML decoding rule) when using such a code on a binary-input channel, is written as follows.

x^=argxCmaxp(yx).\hat{\bm{x}}=arg_{\bm{x}\in{\cal C}}\max p(\bm{y}|\bm{x}).

In case of there being multiple codewords in C\cal C which maximize the likelihood, we will assume that the decoder will break the ties arbitrarily, i.e., the decoder declares any of these codewords (that maximize the likelihood) as the estimate. This estimate x^\hat{\bm{x}} is called the ML estimate for the input codeword x\bm{x}.


The figure shows a depiction of using a linear code with such a decoder.


NOTE

Add generic decoder picture when using a linear code C\cal C. Ask me how the pic should look.


We now explore via an example how the ML decoding works for linear codes on the three channel models we consider in this work. Throughout the three channels, we consider the code C{\cal C} which is the rowspace of the matrix G=(1010101011).G= \begin{pmatrix} 1&0&1&0&1\\ 0&1&0&1&1 \end{pmatrix}. This code C\cal C contains 44 codewords, which are all the F2\mathbb{F}_2-linear-combinations of the two rows of GG. Thus we have,

C={x1=(0,0,0,0,0),x2=(1,0,1,0,1),x3=(0,1,0,1,1),x4=(1,1,1,1,0)}.{\cal C}=\{\bm{x}_1=(0,0,0,0,0),\bm{x}_2=(1,0,1,0,1), \bm{x}_3=(0,1,0,1,1), \bm{x}_4=(1,1,1,1,0)\}.

We now see how the ML decoder decodes, when the codeword x1=(0,0,0,0,0)\bm{x}_1=(0,0,0,0,0) is transmitted, in each of the three channels we consider in this virtual lab.

1. Binary Erasure Channel BEC(ϵ)BEC(\epsilon):

Assume that the received vector y=(?,0,?,0,0)\bm{y}=(?,0,?,0,0). In this case, we see that p(yxi)=0,i=2,3,4p(\bm{y}|\bm{x}_i)=0, \forall i=2,3,4 as xi\bm{x}_i and y\bm{y} are not compatible vectors, for any i=2,3,4i=2,3,4. At the same time, we have p(yx1)=ϵ2(1ϵ)3p(\bm{y}|\bm{x}_1)=\epsilon^2(1-\epsilon)^3. Since ϵ(0,1)\epsilon\in(0,1), clearly, we see that p(yx1)>p(yxi),i=2,3,4.p(\bm{y}|\bm{x}_1)>p(\bm{y}|\bm{x}_i), \forall i=2,3,4. Thus, the ML decoder outputs the estimate x^\hat{\bm{x}} as the codeword x1=(0,0,0,0,0)\bm{x}_1=(0,0,0,0,0). Thus, the ML decoder decodes the transmitted codeword correctly.

It is easy to present a scenario when the decoding can be incorrect. Consider the received vector with three erasures, y=(?,0,?,0,?)\bm{y}=(?,0,?,0,?). In this case, we see that p(yx1)=p(yx2)=ϵ3(1ϵ)2p(\bm{y}|\bm{x}_1)=p(\bm{y}|\bm{x}_2)=\epsilon^3(1-\epsilon)^2, while p(yxi)=0,i=3,4p(\bm{y}|\bm{x}_i)=0, \forall i=3,4. Therefore, in this case, the decoder can declare one of x1\bm{x}_1 or x2\bm{x}_2 as the estimate. In case the decoder chooses x2\bm{x}_2, clearly the decoder will be making a decoding error.

2. Binary Symmetric Channel BSC(p)BSC(p):

Recollect, from the previous experiment, that in the channel BSC(p)BSC(p), for any x,yF2n\bm{x},\bm{y}\in\mathbb{F}_2^n, we have the following: p(yx)=pd(x,y)(1p)nd(x,y),p(\bm{y}|\bm{x})=p^{d(\bm{x},\bm{y})}(1-p)^{n-d(\bm{x},\bm{y})}, where d(x,y)d(\bm{x},\bm{y}) denotes the Hamming distance (number of positions where x\bm{x} and y\bm{y} have distinct values). We assume that p<0.5p<0.5.

Assume that the received vector y=(1,0,0,0,0)\bm{y}=(1,0,0,0,0). Thus, with respect to the four codewords in C\cal C, we have the following for y=(1,0,0,0,0)\bm{y}=(1,0,0,0,0). p(yx1)=p(1p)4,p(yx2)=p2(1p)3,p(yx3)=p4(1p),p(yx4)=p3(1p)2. \begin{align} p(\bm{y}|\bm{x}_1)&=p(1-p)^4,\\ p(\bm{y}|\bm{x}_2)&=p^2(1-p)^3,\\ p(\bm{y}|\bm{x}_3)&=p^4(1-p),\\ p(\bm{y}|\bm{x}_4)&=p^3(1-p)^2. \end{align} Clearly, when p<0.5p<0.5, we see that p(yx1)p(\bm{y}|\bm{x}_1) is the highest among all the p(yxi),i=1,...,4p(\bm{y}|\bm{x}_i), i=1,...,4. Thus, we can say that the ML estimate is x1\bm{x}_1. Note that the decoder indeed makes no error, in case the true transmitted codeword was x1\bm{x}_1.

Now, we explain a scenario when the decoder's estimate involves choosing one among many possible codewords, each of which has the same likelihood value. For instance, assume that the received word is y=(1,1,0,0,0)\bm{y}=(1,1,0,0,0). In this case, p(yx1)=p2(1p)3,p(yx2)=p3(1p)2,p(yx3)=p3(1p)2,p(yx4)=p2(1p)3. \begin{align} p(\bm{y}|\bm{x}_1)&=p^2(1-p)^3,\\ p(\bm{y}|\bm{x}_2)&=p^3(1-p)^2,\\ p(\bm{y}|\bm{x}_3)&=p^3(1-p)^2,\\ p(\bm{y}|\bm{x}_4)&=p^2(1-p)^3. \end{align} Thus, in this case, we find that both x1\bm{x}_1 and x4\bm{x}_4 can be valid ML estimates for the true transmitted codeword, since both of them have the same (and largest) likelihood, among all codewords. Observe that, in this case, if the transmitted codeword was x1\bm{x}_1 and the decoder declared x4\bm{x}_4 as the estimate, then we have an error.

From the above observations, it is clear that, if there are many bit flips when the codeword is transmitted is transmitted through the channel, the probability of decoding error increases.

3. AWGN Channel with Noise N(0,N0/2){\cal N}(0,N_0/2):

Recall that, for the memoryless AWGN channel, we have, for any two vectors x,yRn\bm{x},\bm{y}\in\mathbb{R}^n, the probability of receiving vector y\bm{y} given that vector x\bm{x} was transmitted through the channel is given by p(yx)=1(πN0)n/2e(yx2)N0.p(\bm{y}|\bm{x})=\frac{1}{(\pi N_0)^{n/2}}e^{-\frac{(||\bm{y}-\bm{x}||^2)}{N_0}}.

Now, what does it mean to use a binary code on this channel? Observe that the channel really accepts only real numbered input symbols, however, we have binary digits (bits) in the codeword. Therefore, we need a modulation scheme, which maps the bits to real-numbered symbols that we can transmit.

The simplest modulation scheme when we use a binary code (for example the code C\cal C above) on the AWGN channel is to map the binary symbols {0,1}\{0,1\} to {+1,1}R\{+1,-1\}\in\mathbb{R} respectively. This essentially translates to using a BPSK (Binary phase shift keying) modulation scheme on the AWGN channel, to transmit the bits of the codeword. The fact that we assume bit 00 is mapped to +1R+1\in\mathbb{R} and the bit 11 is mapped to 1R-1\in\mathbb{R} translates to using SNR (signal-to-noise ratio) per codeword-bit as 1/(N0/2)1/(N_0/2). To mathematically model a larger SNR value, we could then assume that the channel has a smaller noise power N0/2N_0/2, without having to change the mapping of the BPSK modulation itself. Under this modulation scheme, the code C\cal C is effectively written as follows. C={x1=(1,1,1,1,1),x2=(1,1,1,1,1),x3=(1,1,1,1,1),x4=(1,1,1,1,1)}.{\cal C}=\{\bm{x}_1=(1,1,1,1,1),\bm{x}_2=(-1,1,-1,1,-1), \bm{x}_3=(1,-1,1,-1,-1), \bm{x}_4=(-1,-1,-1,-1,1)\}. We call this the bipolar representation of the code C\cal C.

Now, with the code C\cal C written in its bipolar representation as above, let us understand the ML decoding, when the received vector is y=(0.5,0.5,1.2,1.5,1.1)\bm{y}=(0.5,0.5,-1.2,-1.5,-1.1). In this case, observe that we have the following equations which simplify the process of finding the ML estimate. x^=argxCmax1(πN0)n/2e(yx2)N0=argxCminyx2=argxCmin(y2+x22y,x), \begin{align} \hat{\bm{x}}&=arg_{\bm{x}\in{\cal C}}\max\frac{1}{(\pi N_0)^{n/2}}e^{-\frac{(||\bm{y}-\bm{x}||^2)}{N_0}}\\ &=arg_{\bm{x}\in{\cal C}}\min||\bm{y}-\bm{x}||^2\\ &=arg_{\bm{x}\in{\cal C}}\min (||\bm{y}||^2+||\bm{x}||^2-2\langle\bm{y},\bm{x}\rangle), \end{align} where y,x\langle\bm{y},\bm{x}\rangle is just the scalar product (also known as the correlation) between the vectors y=(y1,...,yn)\bm{y}=(y_1,...,y_n) and x=(x1,...,xn)\bm{x}=(x_1,...,x_n), which is precisely i=1nxiyi\sum_{i=1}^nx_iy_i. Note that n=5n=5 is the blocklength of the code C\cal C, in our example. Now, the minimization above is with respect to all xC\bm{x}\in{\cal C}. Observe that the value of y2||\bm{y}||^2 does not change in this minimization. Further, for any xC\bm{x}\in{\cal C}, it is easy to see that x=n||\bm{x}||=n (where n=5n=5, in our example), and is therefore constant too. Therefore, the minimization above can be re-written as x^=argxCmaxy,x. \begin{align} \hat{\bm{x}}=arg_{\bm{x}\in{\cal C}}\max\langle\bm{y},\bm{x}\rangle. \end{align} Thus, finding the ML estimate for the true transmitted codeword is that codeword which has the largest scalar product or correlation with the received vector y\bm{y}.

We finally apply this observation to the example hand. We said that the received vector is y=(0.5,0.5,1.2,1.5,1.1)\bm{y}=(0.5,0.5,-1.2,-1.5,-1.1). To find the ML estimate on the AWGN channel, we find the correlations between each xiC\bm{x}_i\in{\cal C} and y\bm{y} and find which one has the largest. We urge the reader to check the following calculations. x1,y=0.5+0.5+(1.2)+(1.5)+(1.1)=2.8x2,y=0.5+0.5+1.2+(1.5)+1.1=0.8x3,y=0.5+(0.5)+(1.2)+1.5+1.1=1.4x4,y=0.5+(0.5)+1.2+1.5+(1.1)=0.6. \begin{align*} \langle\bm{x}_1,\bm{y}\rangle&=0.5 + 0.5 + (-1.2) + (-1.5) + (-1.1) = -2.8\\ \langle\bm{x}_2,\bm{y}\rangle&=-0.5 + 0.5 + 1.2 + (-1.5) + 1.1 = 0.8\\ \langle\bm{x}_3,\bm{y}\rangle&=0.5 + (-0.5) + (-1.2) + 1.5 + 1.1 = 1.4\\ \langle\bm{x}_4,\bm{y}\rangle&= -0.5 + (-0.5) + 1.2 + 1.5 + (-1.1) = 0.6. \end{align*} Therefore, the ML estimate for the transmitted codeword is precisely x3\bm{x}_3.