NoteNextra-origin/content/CSE5313/CSE5313_L8.md

# CSE5313 Coding and information theory for data science (Lecture 8)

## Review on Linear codes

|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ | $3$ | Punctured Hadamard code | $2^{m-1}$|

## More on linear codes

### Extended Hamming code

Consider the Hamming code $[2^m-1,2^m-m-1,3]_{\mathbb{F}_2}$.

Extend it to a cod eof length $2^m$ by adding a parity bit.

Recall the hamming code $[7,4,3]_{2}$.

$$
H_{HAM}=
\begin{pmatrix}
1 & 0 & 0 & 1 & 1 & 0 & 1\\
0 & 1 & 0 & 1 & 0 & 1 & 1\\
0 & 0 & 1 & 0 & 1 & 1 & 1\\
\end{pmatrix}
$$

The extended Hamming code is:

$$
H_{EXT}=
\begin{pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
1 & 0 & 0 & 1 & 1 & 0 & 1 & 0\\
0 & 1 & 0 & 1 & 0 & 1 & 1 & 0\\
0 & 0 & 1 & 0 & 1 & 1 & 1 & 0\\
\end{pmatrix}
$$

The minimum distance of the extended Hamming code is 4.

<details>
<summary>Proof for minimum distance</summary>

It is sufficient to show that every 3 columns are linearly independent.

Using the [lemma for minimum distance](./CSE5313_L7#lemma-for-minimum-distance), we have that the minimum distance is 4.

Notice that in $\mathbb{F}_2$, multiplication is equivalent to AND operation.

By simple observations, we know that every 2 columns are linearly independent.

Consider the following linear combination:

$$
a_1\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_2\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_3\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}=0
$$

Therefore, every 3 columns are linearly independent since the top row will always be 1. if $a_1,a_2,a_3$ are $1$.

Therefore, the minimum distance is 4.

</details>

For the dimension of this code, we have $k=2^m-m-1$, with total code length $m$.

### Augmented Hadamard code

Consider what is generated by the parity check matrix of the extended Hamming code.

Let $xH_{EXT}$

- If $xH_{EXT}=0$, then $(0,x_2,\ldots, x_m)H_{HAM}=(y,0)$, where $y$ is the punctured hadamard codeword.
- If $xH_{EXT}=1$, then $(1,x_2,\ldots, x_m)H_{HAM}=(1,1,\ldots,1)+(y,0)$, where $y$ is the punctured hadamard codeword.

Therefore, the code generated by $H_{EXT}$ is given by:

Let $\mathcal{C}$ be the hadamard code.

Let $\mathcal{C}+\mathbb{I}$ be its shift by the all 1's vector (flip all bits of all words).

– Augmented Hadamard code = $\mathcal{C} \cup \mathcal{C} + \mathbb{I}$.

The length of the code is $2^m$.

The dimension of the code is $m+1$.

Since the code is still linear, the minimum distance is the minimum weight of the codewords.

So minimum distance of $\mathcal{C}+\mathbb{I}$ is $2^{m-1}$.

### Summary for simple linear codes

|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ (length $2^m-1$) | $3$ | Punctured Hadamard code | $2^{m-1}$|
|Extended Hamming code| $2^m-m-1$ (length $2^m$) | $4$ | Augmented Hadamard code | $2^{m-1}$|

## Boundary of linear codes

Natural questions:

- Can we extend the table of linear codes infinitely?
- What set of configuration $(n,k,d)_q$ are impossible?
- What set of configuration $(n,k,d)_q$ are possible, even if we don't know how to construct them?

### Boundary I: Singleton bound

> Singleton is a name for the person who discovered this bound.

Theorem: For any linear code $\mathcal{C}\subseteq \mathbb{F}^n_q$, we have $d\leq n-k+1$.

<details>
<summary>Proof</summary>

Idea: Using the Pigeonhole principle.

Assume an code $[n,k,d]_q$ exists.

Pigeons: All $q^k$ possible code word of $\mathcal{C}$.

Holes: All $q^\ell$ values of the first $\ell$ entries of a codeword (for some $\ell<n$).

If $q^\ell<q^k$, then by Pigeonhole principle, there exists two codewords in $\mathcal{C}$ that agrees on the first $\ell$ entries.

So $d\leq n-\ell$.

So the largest $\ell$ value for which this arguments works is $n-k+1$.

</details>

#### Definition of Maximum Distance Separable (MDS) code

A code $\mathcal{C} = [n,k,d]_q$ with $d = n - k + 1$ is called a Maximum Distance Separable (MDS) code.

<details>
<summary>Examples for singleton bound</summary>

$\mathbb{F}^n$: any $n, k = n, d = 1$.

- Attains with equality!

Parity: any $n, k = n - 1, d = 2$.

- Attains with equality!

Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.

$n - k + 1 = m + 1 > 3$.

This creates some trade-off between $k$ and $d$.

</details>

### Boundary II: The Sphere Packing Bound

Let $r=\lfloor \frac{d-1}{2}\rfloor$, then $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.

<details>
<summary>Proof</summary>

Let $c=(c_1,c_2,\ldots,c_n)\in \mathbb{F}^n_q$, and let $B(c,r)=\{y\in \mathbb{F}^n_q: d_H(c,y)\leq r\}$ for some $r\leq n$.

Computer $|B(c,r)|$.

$|B(c,0)|=1$

$|\{y\in \mathbb{F}^n_q: d_H(c,y)=1\}|=n(q-1)$.

$|\{y\in \mathbb{F}^n_q: d_H(c,y)=2\}|=\binom{n}{2}(q-1)^2=\frac{n(n-1)}{2}(q-1)^2$.

So, $|B(c,r)|=\sum_{i=0}^{r}\binom{n}{i}(q-1)^i$.

Recall that $\mathcal{C}$ of minimum distance $d$ if and only if $\forall c_1,c_2\in \mathcal{C}, B(c_1,\lfloor \frac{d-1}{2}\rfloor)\cap B(c_2,\lfloor \frac{d-1}{2}\rfloor)=\emptyset$.

Therefore, let $r=\lfloor \frac{d-1}{2}\rfloor$, we have $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.

</details>

#### Definition for perfect code

A code $\mathcal{C}$ is called a perfect code if $|C|=q^{n-k}$.

<details>
<summary>Examples for sphere packing bound</summary>

Let $q=2$.

$\mathbb{F}_2^n$: any $n, k = n, d = 1$.

- $r = \frac{d-1}{2} = 0$.
- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-n} = 1$.
- Attained with equality!

Parity: any $n, k = n - 1, d = 2$.

- $r = \frac{d-1}{2} = 0$.

- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-k} = 2^{n-(n-1)} = 2$.

- $q \geq 2 \Rightarrow$ NOT attained with equality.

Exercise: Equality is attained for the repetition code (dual of parity) for odd $n$.

Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.

- $r = \frac{d-1}{2} = 1$.
- $\Rightarrow \sum_{i=0}^{1}\binom{n}{i}(q-1)^i = 1 + (2^{m}-1) = 2^{m}$.
- $\Rightarrow q^{n-k} = 2^{m}$.
- Attained with equality!

• Attained with equality!

</details>

Fortunately, there are only **4** types of **binary linear perfect codes**:

- $\mathbb{F}^n$
- Repetition code
- Hamming code
- $[23,12,7]_2$ Golay code

### Boundary III: The Gilbert-Varshamov Bound

Let $n,k,d,q$ such that $V_q(n-k, d-2)\leq q^{n-k}$, then there exists an $[n,k,d]_q$ code.

> Singleton, sphere-packing provide **necessary** conditions for existence of codes.
>
> Are there **sufficient** conditions?
>
> Recall:
>
> - Lemma: The minimum distance of $\mathcal{C}$ is the maximum integer such that every $d-1$ columns of the parity-check matrix $H$ are linearly independent.
>
> Idea:
>
> - Construct $H$ column by column, ensuring that no dependencies occur.

Idea:

Construct $H$ column by column, ensuring that no dependencies occur.

Algorithm:

- Begin with $(n-k)\times (n-k)$ identity matrix.
- Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$)
- Then next column $h_{\ell}$ must not be in the space of any previous $d-2$ columns.
  - $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^\top$ for $x$ of Hamming weight at most $d-2$.
  - So the ineligible candidates for $h_{\ell}$ is:
    - $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$.
    - $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$, denoted by $V_q(\ell-1, d-2)$.
  - So the candidates for $h_{\ell}$ is:
    - Out of which $V_q(\ell-1, d-2)$ are ineligible.
    - Need $n$ columns overall, so we need $V_q(n-k, d-2)\leq q^{n-k}$.