10 KiB
CSE5313 Coding and information theory for data science (Lecture 24)
Continue on coded computing
Matrix-vector multiplication: y=Ax, where A\in \mathbb{F}^{M\times N},x\in \mathbb{F}^N
- MDS codes.
- Recover threshold
K=M.
- Recover threshold
- Short-dot codes.
- Recover threshold
K\geq M. - Every node receives at most
s=\frac{P-K+M}{P}.Nelements ofx.
- Recover threshold
Matrix-matrix multiplication
Problem Formulation:
A=[A_0 A_1\ldots A_{M-1}]\in \mathbb{F}^{L\times L},B=[B_0,B_1,\ldots,B_{M-1}]\in \mathbb{F}^{L\times L}A_m,B_mare submatrices ofA,B.- We want to compute
C=A^\top B.
Trivial solution:
- Index each worker node by
m,n\in [0,M-1]. - Worker node
(m,n)performs matrix multiplicationA_m^\top\cdot B_n. - Need
P=M^2nodes. - No erasure tolerance.
Can we do better?
1-D MDS Method
Create [\tilde{A}_0,\tilde{A}_1,\ldots,\tilde{A}_{S-1}] by encoding [A_0,A_1,\ldots,A_{M-1}]. with some (S,M) MDS code.
Need P=SM worker nodes, and index each one by s\in [0,S-1], n\in [0,M-1].
Worker node (s,n) performs matrix multiplication \tilde{A}_s^\top\cdot B_n.
\begin{bmatrix}
A_0^\top\\
A_1^\top\\
A_0^\top+A_1^\top
\end{bmatrix}
\begin{bmatrix}
B_0 & B_1
\enn{bmatrix}
Need S-M responses from each column.
The recovery threshold K=P-S+M nodes.
This is trivially parity check code with 1 recovery threshold.
2-D MDS Method
Encode [A_0,A_1,\ldots,A_{M-1}] with some (S,M) MDS code.
Encode [B_0,B_1,\ldots,B_{M-1}] with some (S,M) MDS code.
Need P=S^2 nodes.
\begin{bmatrix}
A_0^\top\\
A_1^\top\\
A_0^\top+A_1^\top
\end{bmatrix}
\begin{bmatrix}
B_0 & B_1 & B_0+B_1
\enn{bmatrix}
Decodability depends on the pattern.
- Consider an
S\times Sbipartite graph (rows on left, columns on right). - Draw an
(i,j)edge if\tilde{A}_i^\top\cdot \tilde{B}_jis missing - Row
iis decodable if and only if the degree of $i$'th left node\leq S-M. - Column
jis decodable if and only if the degree of $j$'th right node\leq S-M.
Peeling algorithm:
- Traverse the graph.
- If
\exists v,$\deg v\leq S-M$, remove edges. - Repeat.
Corollary:
- A pattern is decodable if and only if the above graph does not contain a subgraph with all degree larger than
S-M.
Note
K_{1D-MDS}=P-S+M=\Theta(P)(linearly)K_{2D-MDS}=P-(S-M+1)^2+1.K_{product}<P-M^2=S^2-M^2=\Theta(\sqrt{P})Our goal is to get rid of
P.
Polynomial codes
Polynomial representation
Coefficient representation of a polynomial:
f(x)=f_dx^d+f_{d-1}x^{d-1}+\cdots+f_1x+f_0- Uniquely defined by coefficients
[f_d,f_{d-1},\ldots,f_0].
Value presentation of a polynomial:
- Theorem: A polynomial of degree
dis uniquely determined byd+1points. - Proof Outline: First create a polynomial of degree
dfrom thed+1points using Lagrange interpolation, and show such polynomial is unique. - Uniquely defined by evaluations
[(\alpha_1,f(\alpha_1)),\ldots,(\alpha_{d},f(\alpha_{d}))]
Why should we want value representation?
- With coefficient representation, polynomial product takes
O(d^2)multiplications. - With value representation, polynomial product takes
2d+1multiplications.
Definition of a polynomial code
Problem formulation:
A=[A_0,A_1,\ldots,A_{M-1}]\in \mathbb{F}^{L\times L}, B=[B_0,B_1,\ldots,B_{M-1}]\in \mathbb{F}^{L\times L}
We want to compute C=A^\top B.
Define matrix polynomials:
p_A(x)=\sum_{i=0}^{M-1} A_i x^i, degree M-1
p_B(x)=\sum_{i=0}^{M-1} B_i x^{iM}, degree M(M-1)
where each A_i,B_i are matrices
We have
h(x)=p_A(x)p_B(x)=\sum_{i=0}^{M-1}\sum_{j=0}^{M-1} A_i B_j x^{i+jM}
\deg h(x)\leq M(M-1)+M-1=M^2-1
Observe that
x^{i_1+j_1M}=x^{i_2+j_2M}
if and only if m_1=n_1 and m_2=n_2.
The coefficient of x^{i+jM} is A_i^\top B_j.
Computing C=A^\top B is equivalent to find the coefficient representation of h(x).
Encoding of polynomial codes
The master choose \omega_0,\omega_1,\ldots,\omega_{P-1}\in \mathbb{F}.
- Note that this requires
|\mathbb{F}|\geq P.
For every node i\in [0,P-1], the master computes \tilde{A}_i=p_A(\omega_i)
- Equivalent to multiplying
[A_0^\top,A_1^\top,\ldots,A_{M-1}^\top]by Vandermonde matrix over\omega_0,\omega_1,\ldots,\omega_{P-1}. - Can be speed up using FFT.
Similarly, the master computes \tilde{B}_i=p_B(\omega_i) for every node i\in [0,P-1].
Every node i\in [0,P-1] computes and returns c_i=p_A(\omega_i)p_B(\omega_i) to the master.
c_i is the evaluation of polynomial h(x)=p_A(x)p_B(x) at \omega_i.
Recall that h(x)=\sum_{i=0}^{M-1}\sum_{j=0}^{M-1} A_i^\top B_j x^{i+jM}.
- Computing
C=A^\top Bis equivalent to finding the coefficient representation ofh(x).
Recall that a polynomial of degree d can be uniquely defined by d+1 points.
- With
MNevaluations ofh(x), we can recover the coefficient representation for polynomialh(x).
The recovery threshold K=M^2, independent of P, the number of worker nodes.
Done.
MatDot Codes
Problem formulation:
-
We want to compute
C=A^\top B. -
Unlike polynomial codes, we let $A=\begin{bmatrix} A_0\ A_1\ \vdots\ A_{M-1} \end{bmatrix}$ and $B=\begin{bmatrix} B_0\ B_1\ \vdots\ B_{M-1} \end{bmatrix}$. And
A,B\in \mathbb{F}^{L\times L}. -
In polynomial codes, $A=\begin{bmatrix} A_0 A_1\ldots A_{M-1} \end{bmatrix}$ and $B=\begin{bmatrix} B_0 B_1\ldots B_{M-1} \end{bmatrix}$.
Key observation:
A_m^\top is an L\times \frac{L}{M} matrix, and B_m is an \frac{L}{M}\times L matrix. Hence, A_m^\top B_m is an L\times L matrix.
Let C=A^\top B=\sum_{m=0}^{M-1} A_m^\top B_m.
Let p_A(x)=\sum_{m=0}^{M-1} A_m x^m, degree M-1.
Let p_B(x)=\sum_{m=0}^{M-1} B_m x^m, degree M-1.
Both have degree M-1.
And h(x)=p_A(x)p_B(x).
\deg h(x)\leq M-1+M-1=2M-2
Key observation:
- The coefficient of the term
x^{M-1}inh(x)is\sum_{m=0}^{M-1} A_m^\top B_m.
Recall that C=A^\top B=\sum_{m=0}^{M-1} A_m^\top B_m.
Finding this coefficient is equivalent to finding the result of A^\top B.
Here we sacrifice the bandwidth of the network for the computational power.
General Scheme for MatDot Codes
The master choose \omega_0,\omega_1,\ldots,\omega_{P-1}\in \mathbb{F}.
- Note that this requires
|\mathbb{F}|\geq P.
For every node i\in [0,P-1], the master computes \tilde{A}_i=p_A(\omega_i) and \tilde{B}_i=p_B(\omega_i).
p_A(x)=\sum_{m=0}^{M-1} A_m x^m, degreeM-1.p_B(x)=\sum_{m=0}^{M-1} B_m x^m, degreeM-1.
The master sends \tilde{A}_i,\tilde{B}_i to node i.
Every node i\in [0,P-1] computes and returns c_i=p_A(\omega_i)p_B(\omega_i) to the master.
The master needs \deg h(x)+1=2M-1 evaluations to obtain h(x).
- The recovery threshold is
K=2M-1
Recap on Matrix-Matrix multiplication
A,B\in \mathbb{F}^{L\times L}, we want to compute C=A^\top B with P nodes.
Every node receives \frac{1}{m} of A and \frac{1}{m} of B.
| Code | Recovery threshold K |
|---|---|
| 1D-MDS | \Theta(P) |
| 2D-MDS | \leq \Theta(\sqrt{P}) |
| Polynomial codes | \Theta(M^2) |
| MatDot codes | \Theta(M) |
Polynomial Evaluation
Problem formulation:
- We have
KdatasetsX_1,X_2,\ldots,X_K. - Want to compute some polynomial function
fof degreedon each dataset.- Want
f(X_1),f(X_2),\ldots,f(X_K).
- Want
- Examples:
X_1,X_2,\ldots,X_Kare points in\mathbb{F}^{M\times M}, andf(X)=X^8+3X^2+1.X_k=(X_k^{(1)},X_k^{(2)}), both in\mathbb{F}^{M\times M}, andf(X)=X_k^{(1)}X_k^{(2)}.- Gradient computation.
P worker nodes:
- Some are stragglers, i.e., not responsive.
- Some are adversaries, i.e., return erroneous results.
- Privacy: We do not want to expose datasets to worker nodes.
Replication code
Suppose P=(r+1)\cdot K.
- Partition the
Pnodes toKgroups of sizer+1each. - Node in group
icomputes and returnsf(X_i)to the master. - Replication tolerates
rstragglers, or\lfloor \frac{r}{2} \rflooradversaries.
Linear codes
However, f is a polynomial of degree d, not a linear transformation unless d=1.
f(cX)\neq cf(X), wherecis a constant.f(X_1+X_2)\neq f(X_1)+f(X_2).
Our goal is to create an encoder/decode such that:
- Linear encoding: is the codeword of
[X_1,X_2,\ldots,X_K]for some linear code. - The
f(X_i)are decodable from some subset of $f(\tilde{X}_i)$'s. - $X_i$'s are kept private.
Lagrange Coded Computing
Let \ell(z) be a polynomial whose evaluations at \omega_1,\ldots,\omega_{K} are X_1,\ldots,X_K.
Then every f(X_i)=f(\ell(\omega_i)) is an evaluation of polynomial f\cicc \ell(z) at \omega_i.
If the master obtains the composition h=f\circ \ell, it can obtain every f(X_i)=h(\omega_i).
Goal: The master wished to obtain the polynomial h(z)=f(\ell(z)).
Intuition:
- Encoding is performed by evaluating
\ell(z)at\alpha_1,\ldots,\alpha_P\in \mathbb{F}, andP>Kfor redundancy. - Nodes apply
fon an evaluation of\elland obtain an evaluation ofh. - The master receives some potentially noisy evaluations, and finds
h. - The master evaluates
hat\omega_1,\ldots,\omega_Kto obtainf(X_1),\ldots,f(X_K).
Encoding for Lagrange coded computing
Need polynomial \ell(z) such that:
X_k=\ell(\omega_k)for everyk\in [K].
Having obtained such \ell we let \tilde{X}_i=\ell(\alpha_i) for every i\in [P].
\span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=\span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}.
Want X_k=\ell(\omega_k) for every k\in [K].
Tool: Lagrange interpolation.
\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}.\ell(z)=1and\ell_k(\omega_k)=0for everyj\neq k.\deg \ell(z)=K-1.
Let \ell(z)=\sum_{k=1}^K X_k\ell_k(z).
\deg \ell=K-1.\ell(\omega_k)=X_kfor everyk\in [K].
Let \tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i).
Every \tilde{X}_i is a linear combination of X_1,\ldots,X_K.
(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix}
\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_P(\alpha_1) & \ell_P(\alpha_2) & \cdots & \ell_P(\alpha_P)
\end{bmatrix}
This G is called a Lagrange matrix with respect to
\omega_1,\ldots,\omega_K. (interpolation points)\alpha_1,\ldots,\alpha_P. (evaluation points)
Continue next lecture.