Files
NoteNextra-origin/content/CSE5313/CSE5313_L23.md
Trance-0 970ee8fa5a updates?
2025-11-20 13:03:44 -06:00

242 lines
6.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSE5313 Coding and information theory for data science (Lecture 23)
## Coded Computing
### Motivation
Some facts:
- Moore's law is saturating.
- Improving CPU performance is hard.
- Modern datasets are growing remarkably large.
- E.g., TikTok, YouTube.
- Learning tasks are computationally heavy.
- E.g., training neural networks.
Solution: Distributed Computing for Scalability
- Offloading computation tasks to multiple computation nodes.
- Gather and accumulate computation results.
- E.g., Apache Hadoop, Apache Spark, MapReduce.
### General Framework
- The system involves 1 master node and $P$ worker nodes.
- The master has a dataset $D$ and wants $f(D)$, where $f$ is some function.
- The master partitions $D=(D_1,\cdots,D_P)$, and sends $D_i$ to node $i$.
- Every node $i$ computes $g(D_i)$, where $g$ is somem function.
- Finally, the master collects $g(D_1),\cdots,g(D_P)$ and computes $f(D)=h(g(D_1),\cdots,g(D_P))$, where $h$ is some function.
#### Challenges
Stragglers
- Nodes that are significantly slower than the others.
Adversaries
- Nodes that return errounous results.
- Computation/communication errors.
- Adversarial attacks.
Privary
- Nodes may be curious about the dataset.
### Resemblance to communication channel
Suppose $f,g=\operatorname{id}$, and let $D=(D_1,\cdots,D_P)\in \mathbb{F}^p$ a message.
- $D_i$ is a field element
- $\mathbb{F}$ could be $\mathbb{R}$ or $\mathbb{C}$, $\mathbb{F}^q$.
Observation: This is a distributed storage system.
- An erasure - node that does not respond.
- An error - node that returns errounous results.
Solution:
- Add redundancy to the message
- Error-correcting codes.
### Coded Distributed Computing
- The master partitions $D$ and encodes it before sending to $P$ workers.
- Workers perform computations on coded data $\tilde{D}$ and generate coded results $g(\tilde{D})$.
- The master decode the coded results and obtain $f(D)=h(g(\tilde{D}))$.
### Outline
Matrix-Vector Multiplication
- MDS codes.
- Short-Dot codes.
Matrix-Matrix Multiplication
- Polynomial codes.
- MatDot codes.
Polynomial Evaluation
- Lagrange codes.
- Application to BLockchain.
### Trivial solution - replication
Why no straggler tolerance?
- We employ an individual worker node $i$ to compute $y_i=(a_i,\ldots,a_{iN})\cdot (x_1,\ldots,x_N)^T$.
Replicate the computation?
- Let $r+1$ nodes compute every $y_i$.
We need $P=rM+M$ worker nodes to tolerate $r$ erasures and $\lfloor \frac{r}{2}\rfloor$ adversaries.
### Use of MDS codes
Let $2|M$ and $P=3$.
Let $A_1,A_2$ be submatrices of $A$ such that $A=[A_1^\top|A_2^\top]^\top$.
- Worker node 1 conputes $A_1\cdot x$.
- Worker node 2 conputes $A_2\cdot x$.
- Worker node 3 conputes $(A_1+A_2)\cdot x$.
Observation: the results can be obtained from any two worker nodes.
Let $G\in \mathbb{F}^{M\times P}$ be the generator matrix of an $(P,M)$ MDS code.
The master node computes $F=G^\top A\in \mathbb{F}^{P\times N}$.
Every worker node $i$ computers $F_i\cdot x$.
- $F_i=(G^\top A)_i$ is the i-th row of $G^\top A$.
Notice that $Fx=G^\top A\cdot x=G^\top y$ is the codeword of $y$.
Node $i$ computes an entry in this codeword.
$1$ response = $1$ entyr of the codeword.
The master does **not** need all workers to respond to obtain $y$.
- The MDS property allows decoding from any $M$ $y_i$'s
- This scheme tolerates $P-M$ erasures, and the recovery threshold $K=M$.
- We need $P=r+M$ worker nodes to tolerate $r$ stragglers or $\frac{r}{2}$ adversaries.
- With replication, we need $P=rM+M$ worker nodes.
#### Potential improvements for MDS codes
- The matrix $A$ is usually a (trained) model, and $x$ is the data (feature vector).
- $x$ is transmitted frequently, while the row of $A$ (or $G^\top A$) is communicated in advance.
- Every worker needs to receive the entire $x$ and compute the dot product.
- Communication-heavy
- Can we design a scheme that allows every node only receive only a part of $x$?
### Short-Dot codes
[link to paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8758338)
We want to create a matrix $F\in \mathbb{F}^{P\times M}$ from $A$ such that:
- Every node computes $F_i\cdot x$.
- Every $K$ rows linearly span the row space of $A$.
- Each row of $F$ contains at most $s$ non-zero entries.
In the MDS method, $F=G^\top A$.
- The recovery threshold $K=M$.
- Every worker node needs to receive $s=N$ symbols (the entire x)
No free lunch
Can we trade the recovery threshold $K$ for a smaller $s$?
- Every worker node receives less than $N$ symbols.
- The master will need more than $M$ responses to recover the computation result.
#### Construction of Short-Dot codes
Choose a super-regular matrix $B\in \mathbb{F}^{P\times K}$, where $P$ is the number of worker nodes.
- A matrix is supper-regular if every square submatrix is invertible.
- Lagrange/Cauchy matrix is super-regular (next lecture).
Create matrix $\tilde{A}$ by stacking some $Z\in \mathbb{F}^{(K-M)\times N}$ below matrix $A$.
Let $F=B\cdot \tilde{A}\in \mathbb{F}^{P\times N}$.
**Short-Dot**: create matrix $F\in \mathbb{F}^{P\times N}$ such that:
- Every $K$ rows linearly span the row space of $A$.
- Each row of $F$ contains at most $s=\frac{P-K+M}{P}$. $N$ non-zero entries (sparse).
#### Recovery of Short-Dot codes
Claim: Every $K$ rows of $F$ linearly span the row space of $A$.
<details>
<summary>Proof</summary>
Since $B$ is supper-regular, it is also MDS, i.e., every $K\times K$ submatrix of $B$ is invertible.
Hence, every row of $A$ can be represented as a linear combination of any $K$ rows of $F$.
That is, for every $\mathcal{X}\subseteq[P],|\mathcal{X}|=K$, we can have $\tilde{A}=(B^{\mathcal{X}})^{-1}F^{\mathcal{X}}$.
</details>
What about the sparsity of $F$?
- Want each row of $F$ to be sparse.
#### Sparsity of Short-Dot codes
Build $P\times P$ square matrix whose each row/column contains $P-K+M$ non-zero entries.
Concatenate $\frac{N}{P}$ such matrices and obtain
[Missing slides 18]
We now investigate what 𝑍 should look like to construct such a matrix 𝐹.
• Recall that each column of 𝐹 must contains 𝐾 𝑀 zeros.
They are indexed by the set 𝒰𝑃 , where |𝒰| = 𝐾 𝑀.
Let 𝐵
𝒰𝔽
𝐾𝑀 ×𝐾 be a submatrix of 𝐵 containing rows indexed by 𝒰.
• Since 𝐹 = 𝐵𝐴ሚ , it follows that 𝐹𝑗 = 𝐵𝐴ሚ
𝑗
, where 𝐹𝑗 and 𝐴ሚ
𝑗 are the 𝑗-th column of 𝐹 and 𝐴ሚ.
• Next, we have 𝐵
𝒰𝐴ሚ
𝑗 = 0 (𝐾𝑀)×1.
• Split 𝐵
𝒰 = [𝐵 1,𝑀
𝒰
,𝐵[𝑀+1,𝐾]
𝒰 ], 𝐴ሚ
𝑗 = 𝐴𝑗
𝑇
, 𝑍𝑗
𝑇
𝑇
.
𝐵
𝒰𝐴ሚ
𝑗 = 𝐵 1,𝑀
𝒰 𝐴𝑗 + 𝐵[𝑀+1,𝐾]
𝒰 𝑍𝑗 = 0 (𝐾𝑀)×1.
𝑍𝑗= (𝐵 𝑀+1,𝐾
𝒰
)
1 𝐵 1,𝑀
𝒰 𝐴𝑗
.
• Note that 𝐵 𝑀+1,𝐾
𝒰𝔽
𝐾𝑀 × 𝐾𝑀 is invertible.
Since 𝐵 is super-regular.