# CSE5313 Coding and information theory for data science (Lecture 8) ## Review on Linear codes |The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code| |---------|--------------|----------------------|-----------|-----------------------------| |$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$| |Parity code| $n-1$ | $2$ | Repetition code | $n$| |Hamming code| $2^m-m-1$ | $3$ | Punctured Hadamard code | $2^{m-1}$| ## More on linear codes ### Extended Hamming code Consider the Hamming code $[2^m-1,2^m-m-1,3]_{\mathbb{F}_2}$. Extend it to a cod eof length $2^m$ by adding a parity bit. Recall the hamming code $[7,4,3]_{2}$. $$ H_{HAM}= \begin{pmatrix} 1 & 0 & 0 & 1 & 1 & 0 & 1\\ 0 & 1 & 0 & 1 & 0 & 1 & 1\\ 0 & 0 & 1 & 0 & 1 & 1 & 1\\ \end{pmatrix} $$ The extended Hamming code is: $$ H_{EXT}= \begin{pmatrix} 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\ 1 & 0 & 0 & 1 & 1 & 0 & 1 & 0\\ 0 & 1 & 0 & 1 & 0 & 1 & 1 & 0\\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0\\ \end{pmatrix} $$ The minimum distance of the extended Hamming code is 4.
Proof for minimum distance It is sufficient to show that every 3 columns are linearly independent. Using the [lemma for minimum distance](./CSE5313_L7#lemma-for-minimum-distance), we have that the minimum distance is 4. Notice that in $\mathbb{F}_2$, multiplication is equivalent to AND operation. By simple observations, we know that every 2 columns are linearly independent. Consider the following linear combination: $$ a_1\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}+a_2\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}+a_3\begin{pmatrix} 1\\ \vdots\\ \end{pmatrix}=0 $$ Therefore, every 3 columns are linearly independent since the top row will always be 1. if $a_1,a_2,a_3$ are $1$. Therefore, the minimum distance is 4.
For the dimension of this code, we have $k=2^m-m-1$, with total code length $m$. ### Augmented Hadamard code Consider what is generated by the parity check matrix of the extended Hamming code. Let $xH_{EXT}$ - If $xH_{EXT}=0$, then $(0,x_2,\ldots, x_m)H_{HAM}=(y,0)$, where $y$ is the punctured hadamard codeword. - If $xH_{EXT}=1$, then $(1,x_2,\ldots, x_m)H_{HAM}=(1,1,\ldots,1)+(y,0)$, where $y$ is the punctured hadamard codeword. Therefore, the code generated by $H_{EXT}$ is given by: Let $\mathcal{C}$ be the hadamard code. Let $\mathcal{C}+\mathbb{I}$ be its shift by the all 1's vector (flip all bits of all words). – Augmented Hadamard code = $\mathcal{C} \cup \mathcal{C} + \mathbb{I}$. The length of the code is $2^m$. The dimension of the code is $m+1$. Since the code is still linear, the minimum distance is the minimum weight of the codewords. So minimum distance of $\mathcal{C}+\mathbb{I}$ is $2^{m-1}$. ### Summary for simple linear codes |The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code| |---------|--------------|----------------------|-----------|-----------------------------| |$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$| |Parity code| $n-1$ | $2$ | Repetition code | $n$| |Hamming code| $2^m-m-1$ (length $2^m-1$) | $3$ | Punctured Hadamard code | $2^{m-1}$| |Extended Hamming code| $2^m-m-1$ (length $2^m$) | $4$ | Augmented Hadamard code | $2^{m-1}$| ## Boundary of linear codes Natural questions: - Can we extend the table of linear codes infinitely? - What set of configuration $(n,k,d)_q$ are impossible? - What set of configuration $(n,k,d)_q$ are possible, even if we don't know how to construct them? ### Boundary I: Singleton bound > Singleton is a name for the person who discovered this bound. Theorem: For any linear code $\mathcal{C}\subseteq \mathbb{F}^n_q$, we have $d\leq n-k+1$.
Proof Idea: Using the Pigeonhole principle. Assume an code $[n,k,d]_q$ exists. Pigeons: All $q^k$ possible code word of $\mathcal{C}$. Holes: All $q^\ell$ values of the first $\ell$ entries of a codeword (for some $\ell #### Definition of Maximum Distance Separable (MDS) code A code $\mathcal{C} = [n,k,d]_q$ with $d = n - k + 1$ is called a Maximum Distance Separable (MDS) code.
Examples for singleton bound $\mathbb{F}^n$: any $n, k = n, d = 1$. - Attains with equality! Parity: any $n, k = n - 1, d = 2$. - Attains with equality! Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$. $n - k + 1 = m + 1 > 3$. This creates some trade-off between $k$ and $d$.
### Boundary II: The Sphere Packing Bound Let $r=\lfloor \frac{d-1}{2}\rfloor$, then $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
Proof Let $c=(c_1,c_2,\ldots,c_n)\in \mathbb{F}^n_q$, and let $B(c,r)=\{y\in \mathbb{F}^n_q: d_H(c,y)\leq r\}$ for some $r\leq n$. Computer $|B(c,r)|$. $|B(c,0)|=1$ $|\{y\in \mathbb{F}^n_q: d_H(c,y)=1\}|=n(q-1)$. $|\{y\in \mathbb{F}^n_q: d_H(c,y)=2\}|=\binom{n}{2}(q-1)^2=\frac{n(n-1)}{2}(q-1)^2$. So, $|B(c,r)|=\sum_{i=0}^{r}\binom{n}{i}(q-1)^i$. Recall that $\mathcal{C}$ of minimum distance $d$ if and only if $\forall c_1,c_2\in \mathcal{C}, B(c_1,\lfloor \frac{d-1}{2}\rfloor)\cap B(c_2,\lfloor \frac{d-1}{2}\rfloor)=\emptyset$. Therefore, let $r=\lfloor \frac{d-1}{2}\rfloor$, we have $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
#### Definition for perfect code A code $\mathcal{C}$ is called a perfect code if $|C|=q^{n-k}$.
Examples for sphere packing bound Let $q=2$. $\mathbb{F}_2^n$: any $n, k = n, d = 1$. - $r = \frac{d-1}{2} = 0$. - $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-n} = 1$. - Attained with equality! Parity: any $n, k = n - 1, d = 2$. - $r = \frac{d-1}{2} = 0$. - $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-k} = 2^{n-(n-1)} = 2$. - $q \geq 2 \Rightarrow$ NOT attained with equality. Exercise: Equality is attained for the repetition code (dual of parity) for odd $n$. Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$. - $r = \frac{d-1}{2} = 1$. - $\Rightarrow \sum_{i=0}^{1}\binom{n}{i}(q-1)^i = 1 + (2^{m}-1) = 2^{m}$. - $\Rightarrow q^{n-k} = 2^{m}$. - Attained with equality! • Attained with equality!
Fortunately, there are only **4** types of **binary linear perfect codes**: - $\mathbb{F}^n$ - Repetition code - Hamming code - $[23,12,7]_2$ Golay code ### Boundary III: The Gilbert-Varshamov Bound Let $n,k,d,q$ such that $V_q(n-k, d-2)\leq q^{n-k}$, then there exists an $[n,k,d]_q$ code. > Singleton, sphere-packing provide **necessary** conditions for existence of codes. > > Are there **sufficient** conditions? > > Recall: > > - Lemma: The minimum distance of $\mathcal{C}$ is the maximum integer such that every $d-1$ columns of the parity-check matrix $H$ are linearly independent. > > Idea: > > - Construct $H$ column by column, ensuring that no dependencies occur. Idea: Construct $H$ column by column, ensuring that no dependencies occur. Algorithm: - Begin with $(n-k)\times (n-k)$ identity matrix. - Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$) - Then next column $h_{\ell}$ must not be in the space of any previous $d-2$ columns. - $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^\top$ for $x$ of Hamming weight at most $d-2$. - So the ineligible candidates for $h_{\ell}$ is: - $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$. - $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$, denoted by $V_q(\ell-1, d-2)$. - So the candidates for $h_{\ell}$ is: - Out of which $V_q(\ell-1, d-2)$ are ineligible. - Need $n$ columns overall, so we need $V_q(n-k, d-2)\leq q^{n-k}$.