Trance-0/NoteNextra-origin

Fork 0

Files

Trance-0 970ee8fa5a updates?

2025-11-20 13:03:44 -06:00

6.8 KiB

Raw Blame History

CSE5313 Coding and information theory for data science (Lecture 23)

Coded Computing

Motivation

Some facts:

Moore's law is saturating.
- Improving CPU performance is hard.
Modern datasets are growing remarkably large.
- E.g., TikTok, YouTube.
Learning tasks are computationally heavy.
- E.g., training neural networks.

Solution: Distributed Computing for Scalability

Offloading computation tasks to multiple computation nodes.
Gather and accumulate computation results.
E.g., Apache Hadoop, Apache Spark, MapReduce.

General Framework

The system involves 1 master node and P worker nodes.
The master has a dataset D and wants f(D), where f is some function.
The master partitions D=(D_1,\cdots,D_P), and sends D_i to node i.
Every node i computes g(D_i), where g is somem function.
Finally, the master collects g(D_1),\cdots,g(D_P) and computes f(D)=h(g(D_1),\cdots,g(D_P)), where h is some function.

Challenges

Stragglers

Nodes that are significantly slower than the others.

Adversaries

Nodes that return errounous results.
- Computation/communication errors.
- Adversarial attacks.

Privary

Nodes may be curious about the dataset.

Resemblance to communication channel

Suppose f,g=\operatorname{id}, and let D=(D_1,\cdots,D_P)\in \mathbb{F}^p a message.

D_i is a field element
\mathbb{F} could be \mathbb{R} or \mathbb{C}, \mathbb{F}^q.

Observation: This is a distributed storage system.

An erasure - node that does not respond.
An error - node that returns errounous results.

Solution:

Add redundancy to the message
Error-correcting codes.

Coded Distributed Computing

The master partitions D and encodes it before sending to P workers.
Workers perform computations on coded data \tilde{D} and generate coded results g(\tilde{D}).
The master decode the coded results and obtain f(D)=h(g(\tilde{D})).

Outline

Matrix-Vector Multiplication

MDS codes.
Short-Dot codes.

Matrix-Matrix Multiplication

Polynomial codes.
MatDot codes.

Polynomial Evaluation

Lagrange codes.
Application to BLockchain.

Trivial solution - replication

Why no straggler tolerance?

We employ an individual worker node i to compute y_i=(a_i,\ldots,a_{iN})\cdot (x_1,\ldots,x_N)^T.

Replicate the computation?

Let r+1 nodes compute every y_i.

We need P=rM+M worker nodes to tolerate r erasures and \lfloor \frac{r}{2}\rfloor adversaries.

Use of MDS codes

Let 2|M and P=3.

Let A_1,A_2 be submatrices of A such that A=[A_1^\top|A_2^\top]^\top.

Worker node 1 conputes A_1\cdot x.
Worker node 2 conputes A_2\cdot x.
Worker node 3 conputes (A_1+A_2)\cdot x.

Observation: the results can be obtained from any two worker nodes.

Let G\in \mathbb{F}^{M\times P} be the generator matrix of an (P,M) MDS code.

The master node computes F=G^\top A\in \mathbb{F}^{P\times N}.

Every worker node i computers F_i\cdot x.

F_i=(G^\top A)_i is the i-th row of G^\top A.

Notice that Fx=G^\top A\cdot x=G^\top y is the codeword of y.

Node i computes an entry in this codeword.

1 response = 1 entyr of the codeword.

The master does not need all workers to respond to obtain y.

The MDS property allows decoding from any M $y_i$'s
This scheme tolerates P-M erasures, and the recovery threshold K=M.
We need P=r+M worker nodes to tolerate r stragglers or \frac{r}{2} adversaries.
- With replication, we need P=rM+M worker nodes.

Potential improvements for MDS codes

The matrix A is usually a (trained) model, and x is the data (feature vector).
x is transmitted frequently, while the row of A (or G^\top A) is communicated in advance.
Every worker needs to receive the entire x and compute the dot product.
Communication-heavy
Can we design a scheme that allows every node only receive only a part of x?

Short-Dot codes

link to paper

We want to create a matrix F\in \mathbb{F}^{P\times M} from A such that:

Every node computes F_i\cdot x.
Every K rows linearly span the row space of A.
Each row of F contains at most s non-zero entries.

In the MDS method, F=G^\top A.

The recovery threshold K=M.
Every worker node needs to receive s=N symbols (the entire x)

No free lunch

Can we trade the recovery threshold K for a smaller s?

Every worker node receives less than N symbols.
The master will need more than M responses to recover the computation result.

Construction of Short-Dot codes

Choose a super-regular matrix B\in \mathbb{F}^{P\times K}, where P is the number of worker nodes.

A matrix is supper-regular if every square submatrix is invertible.
Lagrange/Cauchy matrix is super-regular (next lecture).

Create matrix \tilde{A} by stacking some Z\in \mathbb{F}^{(K-M)\times N} below matrix A.

Let F=B\cdot \tilde{A}\in \mathbb{F}^{P\times N}.

Short-Dot: create matrix F\in \mathbb{F}^{P\times N} such that:

Every K rows linearly span the row space of A.
Each row of F contains at most s=\frac{P-K+M}{P}. N non-zero entries (sparse).

Recovery of Short-Dot codes

Claim: Every K rows of F linearly span the row space of A.

Proof

Since B is supper-regular, it is also MDS, i.e., every K\times K submatrix of B is invertible.

Hence, every row of A can be represented as a linear combination of any K rows of F.

That is, for every \mathcal{X}\subseteq[P],|\mathcal{X}|=K, we can have \tilde{A}=(B^{\mathcal{X}})^{-1}F^{\mathcal{X}}.

What about the sparsity of F?

Want each row of F to be sparse.

Sparsity of Short-Dot codes

Build P\times P square matrix whose each row/column contains P-K+M non-zero entries.

Concatenate \frac{N}{P} such matrices and obtain

[Missing slides 18]

We now investigate what 𝑍 should look like to construct such a matrix 𝐹. • Recall that each column of 𝐹 must contains 𝐾 − 𝑀 zeros. – They are indexed by the set 𝒰 ∈ 𝑃 , where |𝒰| = 𝐾 − 𝑀. – Let 𝐵 𝒰 ∈ 𝔽 𝐾−𝑀 ×𝐾 be a submatrix of 𝐵 containing rows indexed by 𝒰. • Since 𝐹 = 𝐵𝐴ሚ , it follows that 𝐹𝑗 = 𝐵𝐴ሚ 𝑗 , where 𝐹𝑗 and 𝐴ሚ 𝑗 are the 𝑗-th column of 𝐹 and 𝐴ሚ. • Next, we have 𝐵 𝒰𝐴ሚ 𝑗 = 0 (𝐾−𝑀)×1. • Split 𝐵 𝒰 = [𝐵 1,𝑀 𝒰 ,𝐵[𝑀+1,𝐾] 𝒰 ], 𝐴ሚ 𝑗 = 𝐴𝑗 𝑇 , 𝑍𝑗 𝑇 𝑇 . • 𝐵 𝒰𝐴ሚ 𝑗 = 𝐵 1,𝑀 𝒰 𝐴𝑗 + 𝐵[𝑀+1,𝐾] 𝒰 𝑍𝑗 = 0 (𝐾−𝑀)×1. • 𝑍𝑗= −(𝐵 𝑀+1,𝐾 𝒰 ) −1 𝐵 1,𝑀 𝒰 𝐴𝑗 . • Note that 𝐵 𝑀+1,𝐾 𝒰 ∈ 𝔽 𝐾−𝑀 × 𝐾−𝑀 is invertible. – Since 𝐵 is super-regular.

6.8 KiB Raw Blame History Unescape Escape