6.5 KiB
CSE5313 Coding and information theory for data science (Lecture 6)
Recap
Vector spaces and subspaces over finite fields
\mathbb{F}^n is a vector space over \mathbb{F}.
With point-wise vector addition and scalar multiplication.
Example
\mathbb{F}_2^4 is a vector space over \mathbb{F}_2.
Let $v=\begin{pmatrix} 1 & 1 & 1 & 1 \end{pmatrix}$
Then v is a vector in \mathbb{F}_2^4 that's "orthogonal" to itself.
v\cdot v=1+1+1+1=4=0 in \mathbb{F}_2.
In general field, the dual space and space may intersect non-trivially.
Let V be a subspace of \mathbb{F}^n.
V is a subgroup of \mathbb{F}^n under vector addition (\mathbb{F}^n,+).
- Apply the theorem: If
His finite, non-empty, and closed under the operation ofG, thenHis a subgroup ofG.
Proof
Since $H\subseteq G$, $H$ is non-empty and closed under the operation of $G$ and finite, then $H\leq G$.left to show:
Associativity: inherited from G.
Unit element: 0\in H.
Consider a\in H, a,a^2,a^3,\cdots are in H. Since H is finite, there exists i,j\in\mathbb{N} such that a^i=a^j.
Then a^i=a^j\iff a^{i-j}=e\in H.
Inverses: a^{-1}\in H.
Automatically holds for unit element traversing.
Is every subgroup of
\mathbb{F}^na subspace?
Answer
No.Consider F_4=\{0,1,x,x+1\} (field extension of \mathbb{F}_2 with p(x)=x).
F_4^2=\{(a,b):a,b\in F_4\}, \{(0,0),(1,1)\} is a subgroup of (F_4^2,+).
But the span of F_4\{(1,1)\} is \{(0,0),(1,1),(x,x),(x+1,x+1)\}\neq \{(0,0),(1,1)\}, which is not a subspace of F_4^2.
Cosets in this definition are called Affine subspaces.
V+a=\{v+a:v\in V\}\text{ for some }a\in \mathbb{F}^n
New content
Linear codes
A linear code \mathcal{C} is a subspace of \mathbb{F}^n over \mathbb{F}.
- The dimension of
\mathcal{C}is denoted byk. - The minimum Hamming distance of
\mathcal{C}is denoted byd. - Notation
\mathcal{C}= [n,k,d]_{\mathbb{F}}.
Two equivalent ways to constructing a linear code:
-
A generator matrix
G\in \mathbb{F}^{k\times n}withkrows andncolumns.\mathcal{C}=\{xG:x\in \mathbb{F}^k\}- The left image of
Gis\mathcal{C}. - The rows of
Gare a basis for\mathcal{C}.
- The left image of
-
A parity check matrix
H\in \mathbb{F}^{(n-k)\times n}with(n-k)rows andncolumns.\mathcal{C}=\{c\in \mathbb{F}^n:Hc^\top=0\}- The right kernel of
His\mathcal{C}. - Multiplying
c^\topbyH"checks" ifc\in \mathcal{C}.
- The right kernel of
Encoding of linear codes
Reminder:
- Encoding is the process of mapping a message
u\in \mathbb{F}^kto a codewordc\in \mathcal{C}\subseteq \mathbb{F}^n.
E: \mathbb{F}^k\to \mathcal{C} is a linear map.
Let \mathcal{C}= [n,k,d]_{\mathbb{F}} be a linear code with generator matrix G\in \mathbb{F}^{k\times n}.
- Encoding is given by
E(x)=xG. - It is injective (1-1). Suppose otherwise, then there exists
x_1,x_2\in \mathbb{F}^ksuch thatx_1G=x_2G. Thenx_1G-x_2G=0\implies (x_1-x_2)G=0\implies x_1-x_2=0\implies x_1=x_2. Therefore,Eis injective.
So linear codes implies linear encoding: E(x)+E(y)=E(x+y).
Systematic codes
Fact: Every G\in \mathbb{F}^{k\times n} can be brought to the form G_{sys}=(I|A) by
- Row operations.
- Permutation of columns.
Fact \{xG|x\in \mathbb{F}^k\} and \{xG_{sys}|x\in \mathbb{F}^k\} are equivalent.
- Same length
n. - Same dimension
k. - Same minimum Hamming distance
d.
Encoding a systematic code:
- The input is a part of the output.
- Efficient encoding
- Immediate decoding (if no errors).
Codes, cosets, encoding, decoding
Linear code [n,k,d]_{\mathbb{F}} is a k dimensional subspace of \mathbb{F}^n.
Size of the code is |\mathbb{F}|^k.
Encoding: x\to xG.
Decoding: (y+e)\to x, y=xG.
Use syndrome to identify which coset \mathcal{C}_i that the noisy-code to \mathcal{C}_i+e belongs to.
H(y+e)^\top=H(y+e)=Hx+He=He
Syndrome decoding
- Heavily depends onn the linear structure of the code.
Linear code \mathcal{C}= [n,k,d]_{\mathbb{F}} is a $k$-dimensional subspace of (\mathbb{F}^n,+).
Shift of Linear code [n,k,d]_{\mathbb{F}} is a $k$-dimensional affine subspace of \mathbb{F}^n.
All cosets of the same size
If w_H(e)\leq \lfloor \frac{d-1}{2}\rfloor, then it is possible to extract y from y+e.
by syndrome decoding, we can do better than exhaustive search.
Idea:
Let y+e belogns to the coset \mathcal{C}+e.
Moreover,$y_1+e$ and y_2+e are in the same coset.
Standard Array
Let \mathcal{C}= [n,k,d]_{\mathbb{F}} and denote |F|=q.
- Then
|\mathcal{C}|=q^k. - The number of cosets is
q^{n-k}.
Then we arrange all q^n elements of \mathbb{F}^n into a q^{n-k}\times q^k array.
- So that every row is a coset (including
\mathcal{C}itself) - Lightest word in each cosets on the leftmost column
Example
Let \mathbb{F}=\mathbb{Z}_2 and C=\{xG|x\in \mathbb{F}_2\}
G=\begin{pmatrix}
1 & 0 & 1 & 1 & 0\\
0 & 1 & 1 & 0 & 1
\end{pmatrix}
So \mathcal{C}=\{00000,10110,01011,11101\}.
Then G=[5,2,3]_2.
The standard array is:
First row is \mathcal{C}.
Second row is \mathcal{C}+(00001),
Third row is \mathcal{C}+(00010).
Fourth row is \mathcal{C}+(00100).
| 00000 | 10110 | 01011 | 11101 |
|---|---|---|---|
| 00001 | 10111 | 01010 | 11100 |
| 00010 | 10100 | 01001 | 11110 |
| 00100 | 10010 | 01101 | 11000 |
Any two elements in a row are of the form y_1'=y_1+e and y_2'=y_2+e for some e\in \mathbb{F}^n.
Same syndrome if H(y_1'+e)^\top=H(y_2'+e)^\top.
Entries in different rows have different syndrome.
Proof
Choose the lightest word in each coset on the leftmost column.
Time complexity: O(n(n-k)). Space complexity: n|F|^n space.
Compare with exhaustive search: Time: O(|F|^n).
Syndrome decoding - Intuition
Given y', we identify the set \mathcal{C} + e to which y' belongs by computing the syndrome.
- We identify
eas the coset leader (leftmost entry) of the row\mathcal{C} + e. - We output the codeword in
\mathcal{C}which is closest (c') by subtractingefromy'.
Syndrome decoding - Formal
Given y'\in \mathbb{F}^n, we identify the set \mathcal{C}+e to which y' belongs by computing the syndrome.
We identify e as the coset leader (leftmost entry) of the row \mathcal{C}+e.
We output the codeword in \mathcal{C} which is closest (example c_3) by subtracting e from y'.