# CSE5313 Coding and information theory for data science (Lecture 8)
## Review on Linear codes
|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ | $3$ | Punctured Hadamard code | $2^{m-1}$|
## More on linear codes
### Extended Hamming code
Consider the Hamming code $[2^m-1,2^m-m-1,3]_{\mathbb{F}_2}$.
Extend it to a cod eof length $2^m$ by adding a parity bit.
Recall the hamming code $[7,4,3]_{2}$.
$$
H_{HAM}=
\begin{pmatrix}
1 & 0 & 0 & 1 & 1 & 0 & 1\\
0 & 1 & 0 & 1 & 0 & 1 & 1\\
0 & 0 & 1 & 0 & 1 & 1 & 1\\
\end{pmatrix}
$$
The extended Hamming code is:
$$
H_{EXT}=
\begin{pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
1 & 0 & 0 & 1 & 1 & 0 & 1 & 0\\
0 & 1 & 0 & 1 & 0 & 1 & 1 & 0\\
0 & 0 & 1 & 0 & 1 & 1 & 1 & 0\\
\end{pmatrix}
$$
The minimum distance of the extended Hamming code is 4.
Proof for minimum distance
It is sufficient to show that every 3 columns are linearly independent.
Using the [lemma for minimum distance](./CSE5313_L7#lemma-for-minimum-distance), we have that the minimum distance is 4.
Notice that in $\mathbb{F}_2$, multiplication is equivalent to AND operation.
By simple observations, we know that every 2 columns are linearly independent.
Consider the following linear combination:
$$
a_1\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_2\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_3\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}=0
$$
Therefore, every 3 columns are linearly independent since the top row will always be 1. if $a_1,a_2,a_3$ are $1$.
Therefore, the minimum distance is 4.
For the dimension of this code, we have $k=2^m-m-1$, with total code length $m$.
### Augmented Hadamard code
Consider what is generated by the parity check matrix of the extended Hamming code.
Let $xH_{EXT}$
- If $xH_{EXT}=0$, then $(0,x_2,\ldots, x_m)H_{HAM}=(y,0)$, where $y$ is the punctured hadamard codeword.
- If $xH_{EXT}=1$, then $(1,x_2,\ldots, x_m)H_{HAM}=(1,1,\ldots,1)+(y,0)$, where $y$ is the punctured hadamard codeword.
Therefore, the code generated by $H_{EXT}$ is given by:
Let $\mathcal{C}$ be the hadamard code.
Let $\mathcal{C}+\mathbb{I}$ be its shift by the all 1's vector (flip all bits of all words).
– Augmented Hadamard code = $\mathcal{C} \cup \mathcal{C} + \mathbb{I}$.
The length of the code is $2^m$.
The dimension of the code is $m+1$.
Since the code is still linear, the minimum distance is the minimum weight of the codewords.
So minimum distance of $\mathcal{C}+\mathbb{I}$ is $2^{m-1}$.
### Summary for simple linear codes
|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ (length $2^m-1$) | $3$ | Punctured Hadamard code | $2^{m-1}$|
|Extended Hamming code| $2^m-m-1$ (length $2^m$) | $4$ | Augmented Hadamard code | $2^{m-1}$|
## Boundary of linear codes
Natural questions:
- Can we extend the table of linear codes infinitely?
- What set of configuration $(n,k,d)_q$ are impossible?
- What set of configuration $(n,k,d)_q$ are possible, even if we don't know how to construct them?
### Boundary I: Singleton bound
> Singleton is a name for the person who discovered this bound.
Theorem: For any linear code $\mathcal{C}\subseteq \mathbb{F}^n_q$, we have $d\leq n-k+1$.
Proof
Idea: Using the Pigeonhole principle.
Assume an code $[n,k,d]_q$ exists.
Pigeons: All $q^k$ possible code word of $\mathcal{C}$.
Holes: All $q^\ell$ values of the first $\ell$ entries of a codeword (for some $\ell
#### Definition of Maximum Distance Separable (MDS) code
A code $\mathcal{C} = [n,k,d]_q$ with $d = n - k + 1$ is called a Maximum Distance Separable (MDS) code.
Examples for singleton bound
$\mathbb{F}^n$: any $n, k = n, d = 1$.
- Attains with equality!
Parity: any $n, k = n - 1, d = 2$.
- Attains with equality!
Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.
$n - k + 1 = m + 1 > 3$.
This creates some trade-off between $k$ and $d$.
### Boundary II: The Sphere Packing Bound
Let $r=\lfloor \frac{d-1}{2}\rfloor$, then $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
Proof
Let $c=(c_1,c_2,\ldots,c_n)\in \mathbb{F}^n_q$, and let $B(c,r)=\{y\in \mathbb{F}^n_q: d_H(c,y)\leq r\}$ for some $r\leq n$.
Computer $|B(c,r)|$.
$|B(c,0)|=1$
$|\{y\in \mathbb{F}^n_q: d_H(c,y)=1\}|=n(q-1)$.
$|\{y\in \mathbb{F}^n_q: d_H(c,y)=2\}|=\binom{n}{2}(q-1)^2=\frac{n(n-1)}{2}(q-1)^2$.
So, $|B(c,r)|=\sum_{i=0}^{r}\binom{n}{i}(q-1)^i$.
Recall that $\mathcal{C}$ of minimum distance $d$ if and only if $\forall c_1,c_2\in \mathcal{C}, B(c_1,\lfloor \frac{d-1}{2}\rfloor)\cap B(c_2,\lfloor \frac{d-1}{2}\rfloor)=\emptyset$.
Therefore, let $r=\lfloor \frac{d-1}{2}\rfloor$, we have $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
#### Definition for perfect code
A code $\mathcal{C}$ is called a perfect code if $|C|=q^{n-k}$.
Examples for sphere packing bound
Let $q=2$.
$\mathbb{F}_2^n$: any $n, k = n, d = 1$.
- $r = \frac{d-1}{2} = 0$.
- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-n} = 1$.
- Attained with equality!
Parity: any $n, k = n - 1, d = 2$.
- $r = \frac{d-1}{2} = 0$.
- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-k} = 2^{n-(n-1)} = 2$.
- $q \geq 2 \Rightarrow$ NOT attained with equality.
Exercise: Equality is attained for the repetition code (dual of parity) for odd $n$.
Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.
- $r = \frac{d-1}{2} = 1$.
- $\Rightarrow \sum_{i=0}^{1}\binom{n}{i}(q-1)^i = 1 + (2^{m}-1) = 2^{m}$.
- $\Rightarrow q^{n-k} = 2^{m}$.
- Attained with equality!
• Attained with equality!
Fortunately, there are only **4** types of **binary linear perfect codes**:
- $\mathbb{F}^n$
- Repetition code
- Hamming code
- $[23,12,7]_2$ Golay code
### Boundary III: The Gilbert-Varshamov Bound
Let $n,k,d,q$ such that $V_q(n-k, d-2)\leq q^{n-k}$, then there exists an $[n,k,d]_q$ code.
> Singleton, sphere-packing provide **necessary** conditions for existence of codes.
>
> Are there **sufficient** conditions?
>
> Recall:
>
> - Lemma: The minimum distance of $\mathcal{C}$ is the maximum integer such that every $d-1$ columns of the parity-check matrix $H$ are linearly independent.
>
> Idea:
>
> - Construct $H$ column by column, ensuring that no dependencies occur.
Idea:
Construct $H$ column by column, ensuring that no dependencies occur.
Algorithm:
- Begin with $(n-k)\times (n-k)$ identity matrix.
- Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$)
- Then next column $h_{\ell}$ must not be in the space of any previous $d-2$ columns.
- $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^\top$ for $x$ of Hamming weight at most $d-2$.
- So the ineligible candidates for $h_{\ell}$ is:
- $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$.
- $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$, denoted by $V_q(\ell-1, d-2)$.
- So the candidates for $h_{\ell}$ is:
- Out of which $V_q(\ell-1, d-2)$ are ineligible.
- Need $n$ columns overall, so we need $V_q(n-k, d-2)\leq q^{n-k}$.