9.4 KiB
CSE5313 Coding and information theory for data science (Lecture 13)
Recap from last lecture
Local recoverable codes:
An [n, k]_q code is called $r$-locally recoverable if
- every codeword symbol
y_jhas a recovering setR_j \subseteq [n] \setminus j([n]=\{1,2,\ldots,n\}), - such that
y_jis computable fromy_ifor alli \in R_j. |R_j| \leq rfor everyj \in n.
Bounds for locally recoverable codes
Bound 1: \frac{k}{n}\leq \frac{r}{r+1}.
Bound 2: d\leq n-k-\lceil\frac{k}{r}\rceil +2. (r=k gives singleton bound)
Turan's Lemma
Let G be a graph with n vertices. Then there exists an induced directed acyclic subgraph (DAG) of G on at least \frac{n}{1+\operatorname{avg}_i(d^{out}_i)} nodes, where d^{out}_i is the out-degree of vertex i.
Proof using probabilistic method.
Proof via the probabilistic method
Useful for showing the existence of a large acyclic subgraph, but not for finding it.
Tip
Show that
\mathbb{E}[X]\geq something, and therefore there existsU_\piwith|U_\pi|\geq something, using pigeonhole principle.
For a permutation \pi of [n], define U_\pi = \{\pi(i): i \in [n]\}.
Let i\in U_\pi if each of the d_i^{out} outgoing edges from i connect to a node j with \pi(j)>\pi(i).
In other words, we select a subset of nodes U_\pi such that each node in U_\pi has an outgoing edge to a node in U_\pi with a larger index. All edges going to right.
This graph is clearly acyclic.
Choose \pi at random and Let X=|U_\pi| be a random variable.
Let X_i be the indicator random variable for i\in U_\pi.
So X=\sum_{i=1}^{n} X_i.
Using linearity of expectation, we have
E[X]=\sum_{i=1}^{n} E[X_i]
E[X_i] is the probability that \pi places i before any of its out-neighbors.
For each node, there are (d_i^{out}+1)! ways to place the node and its out-neighbors.
For each node, there are d_i^{out}! ways to place the out-neighbors.
So, E[X_i]=\frac{d_i^{out}!}{(d_i^{out}+1)!}=\frac{1}{d_i^{out}+1}.
Since X=\sum_{i=1}^{n} X_i, we have
Tip
Recall Arithmetic mean (
\frac{1}{n}\sum_{i=1}^{n} x_i) is greater than or equal to harmonic mean (\frac{n}{\sum_{i=1}^{n} \frac{1}{x_i}}).
\begin{aligned}
E[X]&=\sum_{i=1}^{n} E[X_i]\\
&=\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\
&=\frac{1}{n}\sum_{i=1}^{n} \frac{1}{d_i^{out}+1}\\
&\geq \frac{n}{\frac{1}{n}(1+\sum_{i=1}^{n} d_i^{out})}\\
&=\frac{n}{1+\operatorname{avg}_i(d_i^{out})}\\
\end{aligned}
Locally recoverable codes
Bound 1
\frac{k}{n}\leq \frac{r}{r+1}
Proof via Turan's Lemma
Let G be a directed graph such that i\to j is an edge if j\in R_i. (j required to repair i)
d_i^{out}\leq r for every i, so \operatorname{avg}_i(d_i^{out})\leq r.
By Turan's Lemma, there exists an induced directed acyclic subgraph (DAG) G_U on G with U nodes such that |U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}.
Since G_U is a DAG, it is acyclic. So there exists a vertex u_1\in U with no outgoing edges inside G_U. (leaf node in G_U)
Note that if we remove u_1 from G_U, the remaining graph is still a DAG.
Let G_{U_1} be the induced graph on U_1=U\setminus \{u_1\}.
We can find a vertex u_2\in U_1 with no outgoing edges inside G_{U_1}.
We can continue this process until we have all the vertices in U removed.
So all symbols in U are redundant.
Note the number of redundant symbols =n-k\geq \frac{n}{1+r}.
So k\leq \frac{rn}{r+1}.
So \frac{k}{n}\leq \frac{r}{r+1}.
Bound 2
d\leq n-k-\lceil\frac{k}{r}\rceil +2
Definition for code reduction
For C=[n,k]_q code, let I\subseteq [n] be a set of indices, e.g., I=\{1,3,5,6,7,8\}.
Let C_I be the code obtained by deleting the symbols that is not in I. (only keep the symbols with indices in I)
Then C_I is a code of length |I|, C_I is generated by G_I\in \mathbb{F}^{k\times |I|}_q (only keep the rows of G with indices in I).
|C_I|=q^{\operatorname{rank}(G_I)}.
Note G_I not necessarily of rank k.
Reduction lemma
Let \mathcal{C} be an [n, k]_q code. Then d=n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.
Proof for reduction lemma
We proceed from two inequalities.
First d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.
Let I\subseteq [n] with |C_I|\leq q^k.
Then there exists m_1,m_2\in \mathbb{F}_q^k such that m_1G_I=m_2G_I.
Let y_1=m_1G_I and y_2=m_2G_I.
Then y_1,y_2 have at least |I| entries in common.
So d_H(y_1,y_2)\leq n-|I|.
So d\leq n-|I| for every I such that |C_I|\leq q^k.
So d\leq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.
Now we show the other inequality.
d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.
Let y_1,y_2\in \mathcal{C} such that d_H(y_1,y_2)=d.
Let J be the set on which y_1 and y_2 identical.
So |J|=n-d.
|C_J|<q^k since y_1 and y_2 have at least d entries in common.
So \max\{|I|:C_I<q^k,I\subseteq [n]\}\geq n-d. (by existence of J)
So d\geq n-\max\{|I|:C_I<q^k,I\subseteq [n]\}.
We prove bound 2 using the same subgraph from proof of bound 1.
Proof for bound 2
Let G be a directed graph such that i\to j is an edge if j\in R_i. (j required to repair i)
d_i^{out}\leq r for every i, so \operatorname{avg}_i(d_i^{out})\leq r.
By Turan's Lemma, there exists an induced directed acyclic subgraph (DAG) G_U on G with U nodes such that |U|\geq \frac{n}{1+\operatorname{avg}_i(d_i^{out})}\geq \frac{n}{1+r}.
Let U'\subset U with size \lfloor\frac{k-1}{r}\rfloor.
U' exists since \frac{n}{r+1}>\frac{k}{r} by bound 1, and \frac{k}{r}>\lfloor\frac{k-1}{r}\rfloor.
So G_{U'} is also acyclic.
Let N\subseteq [n]\setminus U' be the set of neighbors of U' in G_U.
|N|\leq r|U'|\leq k-1.
Complete n to be of the size k-1 by adding elements not in U'.
|C_N|\leq q^{k-1}
Also |N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor
Note that all nodes in G_U can be recovered from nodes in N.
So |C_{N\cup U'}|=|C_N|\leq q^{k-1}.
Therefore, \max\{|I|:C_I<q^k,I\subseteq [n]\}\geq |N\cup U'|=k-1+\lfloor\frac{k-1}{r}\rfloor.
Using reduction lemma, we have d= n-\max\{|I|:C_I<q^k,I\subseteq [n]\}\leq n-k-1-\lfloor\frac{k-1}{r}\rfloor+2=n-k-\lceil\frac{k}{r}\rceil +2.
Construction of locally recoverable codes
Recall Reed-Solomon code:
\{f(a_1),f(a_2),\ldots,f(a_n)|f(x)\in \mathbb{F}_q^{k-1}[x]\}.
\dim=kd=n-k+1- Need
n\geq q. - To encode
a=a_1,a_2,\ldots,a_n, we need to evaluatef_a(x)=\sum_{i=0}^{k-1}f_a(a_i)x^i.
Assume r+1|n and r|k.
Choose A=\{\alpha_1,\alpha_2,\ldots,\alpha_n\}
Partition A into \frac{n}{r+1} subsets of size r+1.
Definition for good polynomial
A polynomial g over \mathbb{F}_q is called good if
\deg(g)=r+1gis constant for everyA_i., i.e.,\forall x,y\in A_i, g(x)=g(y).
To encode a=(a_0,a_1,\ldots,a_n), we consider a\in \mathbb{F}_q^{r\times (k/r)}
Let f_a(x)=\sum_{i=0}^{r-1} f_{a,i}(x)\cdot x^i
f_{a,i}(x)=\sum_{j=0}^{k/r-1} a_{i,j}\cdot g(x)^j, where g is a good polynomial.
Proof for valid codeword
Ifa\neq b, then (f_a(\alpha_i))_{i=1}^{n}\neq (f_b(\alpha_i))_{i=1}^{n}.
\deg f_a = k-2- Maximum degree monomial in
f_a:a_{r-1,k/r-1}\cdot g(x)^{k/r-1}\cdot x^{r-1}. \deg f_a \leq \frac{k}{r}-1+\frac{k}{r}-1=k-2.
By Bound 1: k-2\leq n-2.
If f_a(\alpha_i)_{i=1}^{n}=f_b(\alpha_i)_{i=1}^{n}. then
f_aandf_bagree on more points than their degree.a=b, a contradiction.
Corollary: |C|=q^k.
In other words,
\mathcal{C}=\{(f_a(\alpha_i))_{i=1}^{n}|a\in \mathbb{F}_q^k\}
For data a\in \mathbb{F}_q^k, the $i$th server stores f_a(\alpha_i).
Note \mathcal{C} is a linear code. and \dim \mathcal{C}=k.
Let \alpha\in \{\alpha_1,\alpha_2,\ldots,\alpha_n\}.
Suppose \alpha\in A_1.
Locality is ensured by recovering failed server \alpha by f_a(\alpha_i) from \{f_a(\beta):\beta\in A_1\setminus \{\alpha\}\}.
Note g is constant on A_1.
Denote the constant by f_i(A_1).
Let \delta_1(x)=\sum_{i=0}^{r-1} f_{a,i}(A_1)x^i.
\delta_1(x)=f_a(x) for all x\in A_1.
Constructing good polynomial
Let (\mathbb{F}_q^*,\cdot) be the multiplicative group of \mathbb{F}_q.
For a subgroup H of this group, a multiplicative coset is Ha=\{ha|h\in H,a\in \mathbb{F}_q^*\}.
The family of all cosets of H:\{Ha|a\in \mathbb{F}_q^*\}.
This is a partition of \mathbb{F}_q^*.
Definition of annihilator
The annihilator of H is g(x)=\prod_{h\in H}(x-h).
Note that g(h)=0 for all h\in H.
Annihilator is a good polynomial
Fix H, consider the partition \{Ha|a\in \mathbb{F}_q^*\} and let g(x)=\prod_{a\in \mathbb{F}_q^*}(x-a). Then g is a good polynomial.
Proof
Need to show if x,y\in Ha, then g(x)=g(y).
Since x\in Ha, there exists h',h''\in H such that x=h'a and y=h''a.
So g(x)=g(h'a)=\prod_{h\in H}(h'a-h)=(h')^{|H|}\prod_{h\in H}(a-h/h').
By Fermat's Little Theorem, (h')^{|H|}\equiv 1 for every h'\in H.
So \{h/h'|h\in H\}=H for every h'\in H.
So g(x)=(h')^{|H|}\prod_{h\in H}(a-h/h')=\prod_{h\in H}(a-h/h')=g(a).
Similarly, g(y)=g(h''a)=\prod_{h\in H}(h''a-h)=(h'')^{|H|}\prod_{h\in H}(a-h/h'')=g(a).