LDPC Codes: Introduction

Theory

LDPC Codes

Recall that a linear code C{\cal C} can be described as the set of all vectors which are orthogonal to its parity check matrix HH, i.e., C={cF2n:HcT=0}{\cal C}=\{\boldsymbol{c}\in {\mathbb F}_2^n: H\boldsymbol{c}^T={\boldsymbol 0}\}.

A matrix is said to be sparse when the number of ones is smaller than the number of zeros in the matrix. A Low-Density Parity Check (LDPC) code is one which has a very sparse parity check matrix, i.e., in which the number of ones is much smaller than the number of zeros. For example, consider the matrix HH,

H=[11100000001100000100110000000111]. \begin{aligned} H=\begin{bmatrix} 1&1&1&0&0&0&0&0\\ 0&0&1&1&0&0&0&0\\ 0&1&0&0&1&1&0&0\\ 0&0&0&0&0&1&1&1 \end{bmatrix}. \end{aligned} The total number of 1s in the above matrix is 1111 while the number of zeros is 2121. Clearly this is a sparse matrix. Note also that the all rows of the HH matrix are linearly independent. We see that nk=4n-k=4 and n=8n=8 for this code, which means that the rate is 12\frac{1}{2}. Further, also observe that the first parity check equation as per HH is c1+c2+c3=0c_1+c_2+c_3=0. We therefore say that the coordinates c1,c2,c3c_1,c_2, c_3 of each codeword are participating in the first parity check equation. Similarly, c3,c4c_3,c_4 participate in the second parity check equation, and so on.

In a more general sense than the example above, the sparse HH matrix of an LDPC code may have some linearly redundant rows. Thus, the redundancy of the code can be lower than the number of rows of HH (i.e., the dimension of the code can be greater than nthe number of rows of Hn-\text{the number of rows of}~H). However, throughout our lab here, we will only consider HH matrices of codes which have full rank.

LDPC codes are one of the most widely used codes in practice. Most practical LDPC codes have blocklengths 5000 and above, where each row could have only a few entries which are 11. Carefully designed LDPC codes have extremely good performance (very low probability of error, and operational rates close to the channel capacity, which is the maximum rate at which a channel with certain noise level can operate) as well as reasonable (though not small) encoding and decoding complexity, which explains their widespread adoption in applications.

Regular and Irregular LDPC codes

An LDPC code is called a Regular LDPC code if the number of 1s in each row of the sparse HH matrix is identical (say, equal to wrw_r) and the number of 1s in each column of HH is also identical (say, equal to wcw_c). Since the number of rows is nkn-k and the number of columns is nn, it should therefore be true that nwc=(nk)wrnw_c=(n-k)w_r. The design rate of such a regular LDPC code is defined as 1nkn=1wcwr1-\frac{n-k}{n}=1-\frac{w_c}{w_r}. Note that this corresponds to the actual rate of the code, provided the rows of the HH matrix are linearly independent, which is our assumption here.

A LDPC code is called an irregular LDPC code if the number of 11s in the rows (or columns) of the HH matrix are not identical. That is, different rows could have different number of 1s, and so can columns, in an irregular code. Observe that, the code with the example HH matrix shown above is an irregular LDPC code. Indeed, the second row has only 22 ones, while the others have 33 ones each.

In practice, regular LDPC codes are easier to analyze theoretically, but carefully designed irregular LDPC codes generally have better performance than regular LDPC codes.

Tanner graphs of LDPC codes

A graph is a collection of vertices (drawn as points on a plane, typically) and edges (drawn as lines or curves between pairs of vertices, typically). A bipartite graph is a graph whose vertex set can be partitioned into two subsets, such that any edge of the graph exists only between pairs of vertices, exactly one from each subset. Thus, no edge of the bipartite graph exists between pairs of vertices in the same subset of the partition.

Any HH matrix can be represented using a bipartite graph, called a Tanner graph, as follows. Corresponding to the nn coordinates in the codeword, we construct nn vertices {c1,...,cn}\{c_1,...,c_n\}. We call these vertices as the variable nodes of the Tanner graph. Corresponding to the nkn-k rows (each one representing one parity equation or parity check equation), we construct nkn-k check nodes, denoted as {z1,z2,..,znk}\{z_1,z_2,..,z_{n-k}\}. A variable node cic_i is connected to a check node zjz_j, if and only if the (j,i)th(j,i)^{th} entry of the HH matrix is 11. In other words, if {ci,zj}\{c_i,z_j\} is an edge of the Tanner graph, then the ithi^{th} coordinate cic_i participates in the jthj^{th} parity check equation.

The figure below shows the Tanner graph of the matrix HH given in the example above.

Tanner Graph

Observe that the degree of each variable node, which is the number of check nodes with which it has an edge, represents the number of parity check equations in which the corresponding coordinate participates in. For instance, the coordinate c2c_2 participates in two check equations in the given example. So, the degree of variable node c2c_2 is precisely two in the graph. Similarly, the degree of each check node is the number of coordinates that participate in the specific parity check equation. For instance, the first check node has degree 33, as the first parity check equation involves three variables.

In the upcoming experiments, we will see how the Tanner graph plays a crucial role in visualizing the decoding of LDPC codes on various channels.