10 KiB
CSE5313 Coding and information theory for data science (Lecture 14)
The repair problem
Main challenge:
- Locality (number of contacted servers)
- Bandwidth (number of bits transferred)
From last lecture we build optimal Local Recoverable Codes (LRCs) for storage systems.
Let \mathcal{C} = [n, k]_q which is $r$-LRC, with minimum distance d.
-
Bound 1:
\frac{k}{n}\leq \frac{r}{r+1}. -
Bound 2:
d\leq n-k-\frac{k}{r}+2. -
Optimal LRC:
-
Let
\mathcal{A} = \{\alpha_1, \ldots, \alpha_n\}, partition\mathcal{A}to\mathcal{A}_ifori=1to\frac{n}{r+1}. -
g\in \mathbb{F}_q[x]is good if\deg(g) = r+1andgis constant on all $\mathcal{A}_i$'s. -
\mathcal{C} = \{f_a(\alpha_i)\}_{i=1}^{n}|a\in \mathbb{F}_q^k\}, wheref_a(x) = \sum_{i=0}^{r-1} f_{a,i}(x)\cdot x^i,f_{a,i}(x) = \sum_{j=0}^{k/r-1} a_{i,j}\cdot g(x)^j, wheregis a good polynomial.gis a "good" polynomial.
-
\dim \mathcal{C} = kandd = n-k-\frac{k}{r}+2.
Minimizing the repair bandwidth
Goal: understand repair bandwidth.
- What is the minimum repair bandwidth?
- Is repair bandwidth in trade-off with other parameters?
- Tool: The information flow graph.
Spoiler alert:
- Tradeoff: Storage Repair and bandwidth.
- Codes which achieve an optimal tradeoff.
Information flow graph
We can model the repair problem as a directed graph.
- Source: System admin.
- Sink: Data collector.
- Nodes: Storage servers.
- Nodes leave/crash
- Newcomer replaces them
\tonew nodes.
- Edges: Represents transmission of information. (Number of
\mathbb{F}_qelements is weight.)
Main observation:
kelements from\mathbb{F}_qmust "flow" from the source (system admin) to the sink (data collector).- Any cut
(U,\overline{U})which separates source from sink must have capacity at leastk.
Roadmap:
Information flow graph \to Minimum cut analysis \to Bound on file size \to Storage/bandwidth tradeoff.
Basic definitions for information flow graph
Warning
This is not the same as definitions in linear codes.
kis not the dimension of the code anddis not the minimum distance of the code for general cases.
Parameters:
nis the number of nodes in the initial system (before any node leaves/crashes).kis the number of nodes required to reconstruct the filek.dis the number of nodes required to repair a failed node.\alphais the storage at each node.\betais the edge capacity for repair.Bis the file size.
Goal: Find the trade off between n,k,d,\alpha,\beta,B using min-cut analysis of the information flow graph.
Initial system:
We denote the system admin as S
Sever as 1,2,\ldots,n, each with edge capacity \alpha.
For each new server:
We have two nodes in and out, the edge weight is \alpha.
Connect to d previous nodes $out$'s with edge capacity \beta.
Data collector:connects to k arbitrary nodes with each edge capacity \alpha.
Observe that:
- File size
B. - Any cut separating
SformDCmust have capacity at leastB. - Otherwise, two different files are indistinguishable to
DC.
Bound on bandwidth
Claim: mincut\geq \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.
Intuition: Let (U, \overline{U}) be a cut separating S form DC.
- The DC contacts newest
knodes, sayn_1,n_2,\ldots,n_k. - The cut must decide if to cross
\alphaedges or\betaedges. - At east
d-iedges to go toU.
Proof
Let (U, \overline{U}) be a cut separating S form DC, assuming S\in U and DC\in \overline{U}.
Every directed acyclic graph has topological sort.
Let x_{out}^1 be the first out node in \overline{U}. There are two cases:
x_{in}^1\in U. Then\alphaedges must be crossed.x_{in}^1\in \overline{U}. Then alldincoming edges tox_{in}^1must be crossed. (Otherwise, there exists an earlieroutnodex_{out}^jwithx_{in}^j\in U, contradicting the topological sort.)
So x_{out}^1 contributes at least \min\{d\beta, \alpha\} to the cut capacity.
Let x_{out}^2 be the second out node in \overline{U}. There are two cases:
x_{in}^2\in U. Then\alphaedges must be crossed.x_{in}^2\in \overline{U}. Then at leastd-1incoming edges (1edge may come fromx_{out}^1) tox_{in}^2must be crossed. (Otherwise, there exists an earlieroutnodex_{out}^jwithx_{in}^j\in U, contradicting the topological sort.)
So x_{out}^2 contributes at least \min\{(d-1)\beta, \alpha\} to the cut capacity.
By repeating this process, we can show that the minimum cut capacity is at least \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.
Storage/bandwidth tradeoff
Claim: There exists an information graph with mincut = \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.
Homework: Build this graph as follows:
- Construct the initial graph with
nnodes and the system admin. - Add
n+knodes, each node connects to the most recentdnodes. - Find the minimum cut capacity.
Corollary: B\leq \sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\}.
Definition of regenerate code
A code which attains B=\sum_{i=1}^{k-1}\min\{(d-i)\beta, \alpha\} is called a regenerate code.
Goal: Find tradeoff between storage \alpha to repair bandwidth d\beta.
Let \gamma = d\beta, then B \leq \sum_{i=0}^{k-1}\min\{(1-i/d)\gamma, \alpha\}.
Tool: Fix \gamma and d, and minimize for \alpha (not shown).
Result: The storage/bandwidth tradeoff.
- Each point on/above the line is feasible.
- Points on the line = regenerating codes.
- One endpoint: Minimum Bandwidth Regenerating (MBR) codes.
- another endpoint: Minimum Storage Regenerating (MSR) codes
For Minimum Storage Regenerating (MSR) codes, we have \alpha = \frac{B}{k}, \beta = \frac{B}{k(d-k+1)}
For Minimum Bandwidth Regenerating (MBR) codes, we have \alpha = \frac{dB}{kd-\frac{k(k-1)}{2}}, \beta = \frac{B}{kd-\frac{k(k-1)}{2}}
Notes:
- In MSR
\alpha=B/k, Data collector contactsknodes and downloadsB/kfrom each to reconstruct the file of sizeB, that is optimal. - In MBR
\beta=B/(kd-\frac{k(k-1)}{2}), new comer download exactly what it stores, which is the same as replication. This has much smaller storage overhead in the replication.
Regenerating codes, Magic #1:
- MBR: Same repair-bandwidth as replication (
\alpha), at lower storage costs. - MSR: Same reconstruction-bandwidth (
B/k) as replication, at lower storage costs.
Regenerating codes, Magic #2:
- In MSR:
\gamma = d\beta = \frac{dB}{k(d-k+1)},\alpha = \frac{B}{k} - In MBR:
\gamma = d\beta = \frac{dB}{kd-\frac{k(k-1)}{2}},\alpha = \frac{dB}{kd-\frac{k(k-1)}{2}} - Both decreasing functions of
d. \RightarrowLess repair-bandwidth by contacting more nodes, minimized atd = n - 1.
Constructing Minimum bandwidth regenerating (MBR) codes from Maximum distance separable (MDS) codes
Observation: For MBR code with parameters n, k, d and \beta = 1, one can construct MBR with parameters n, k, d and any \beta.
Next: Construct MBR for [n, k, d = n - 1] and \beta = 1.
In any MBR: \alpha, \beta = \frac{dB}{kd-\frac{k(k-1)}{2}}, \frac{B}{kd-\frac{k(k-1)}{2}}
Specifically:
- Storage
\alpha = d\beta = d = n - 1. - File size
B = kd - \binom{k}{2}\beta = kd - \binom{k}{2}
Take an [\binom{n}{2}, B] MDS code (e.g., Reed-Solomon).
Need q\geq \frac{n}{2}.
Consider a complete graph K_n on n nodes.
\binom{n}{2}edges.- Place each codeword symbol on a distinct edge.
- Storage server
istores all codeword symbols adjacent with nodei.\alpha = n - 1.
Repairing on MBR codes
New comer contacts each node j\neq i;
And downloads the symbol on edge (i,j).
We get \alpha=n-1=d\beta which is optimal.
Reconstruction on MBR codes
We use [\binom{n}{2}, B]_q MDS code. So any B symbols suffice to reconstruct the file.
Any t nodes have \binom{t}{2} edges between them, and (n-1)t-2\binom{t}{2} edges to other nodes.
Overall (n-1)t-\binom{t}{2}. For t=k, we get kd-\binom{k}{2}=B.
Constructing Minimum bandwidth regenerating (MBR) codes from Product-Matrix MBR codes
Recall: File size in MBR B=kd-\binom{k}{2}=\binom{k+1}{2}+k(d-k).
Step 1: Arrange the B=\binom{k+1}{2}+k(d-k) symbols in a matrix M follows:
M=\begin{pmatrix}
S & T\\
T^T & 0
\end{pmatrix}\in \mathbb{F}_q^{d\times d}
Sis ak\times ksymmetric matrix. contains\binom{k+1}{2}symbols.Tis ak\times (d-k)matrix. containsk(d-k)symbols.
So there are B elements overall.
Step 2: Construct the encoding matrix C=(\Psi,\Delta)\in \mathbb{F}_q^{n\times d}
\Psi\in \mathbb{F}_q^{n\times k} such that
- Any
krows are linearly independent. - Example: Vandermonde matrix.
\Delta\in \mathbb{F}_q^{n\times (d-k)} such that
- Any
drows ofCare linearly independent. - Example: Complete
\Psito a fulln\times dVandermonde matrix.
Step 3: Encoding of the data M\in \mathbb{F}_q^{d\times d} using the encoding matrix C\in \mathbb{F}_q^{n\times d}.
- Multiply
MbyC. - Store the
ithe row ofCMin the nodei. - Note
CM=(\Psi,\Delta)M=(\Psi S+\Delta T, \Psi T)
Repairing on Product-Matrix MBR codes
Assume node i storing c_iM is lost.
Repair from (any) nodes H = \{h_1, \ldots, h_d\}.
- Node
h_jstoresc_{h_j}M.
Newcomer contacts each h_j: “My name is i, and I’m lost.”
Node h_j sends c_{h_j}M c_i^T (inner product).
Newcomer assembles C_H Mc_i^T.
CH invertible by construction!
-
Recover
Mc_i^T. -
Recover
c_i^TM(Mis symmetric)
Reconstruction on Product-Matrix MBR codes
Data Collector (DC) contacts (any) nodes D = \{d_1, \ldots, d_k\}.
- Node
d_jstoresc_{d_j}M.
Downloads c_{d_j}M from node d_j.
DC assembles C_D M.
- Recall
CM=(\Psi S,\Delta)M=(\Psi S+\Delta T, \Psi T) C_D M=(\Psi_D S,\Delta_D)M=(\Psi_D S+\Delta_D T, \Psi_D T)
\Psi_D invertible by construction.
- DC computes
\Psi_D^{-1}C_DM = (S+\Psi_D^{-1}\Delta_D^T, T) - DC obtains
T. - Subtracts
\Psi_D^{-1}\Delta_D T^TfromS+\Psi_D^{-1}\Delta_D T^Tto obtainS.
