Files
NoteNextra-origin/content/CSE5313/CSE5313_L23.md
Trance-0 970ee8fa5a updates?
2025-11-20 13:03:44 -06:00

6.8 KiB
Raw Blame History

CSE5313 Coding and information theory for data science (Lecture 23)

Coded Computing

Motivation

Some facts:

  • Moore's law is saturating.
    • Improving CPU performance is hard.
  • Modern datasets are growing remarkably large.
    • E.g., TikTok, YouTube.
  • Learning tasks are computationally heavy.
    • E.g., training neural networks.

Solution: Distributed Computing for Scalability

  • Offloading computation tasks to multiple computation nodes.
  • Gather and accumulate computation results.
  • E.g., Apache Hadoop, Apache Spark, MapReduce.

General Framework

  • The system involves 1 master node and P worker nodes.
  • The master has a dataset D and wants f(D), where f is some function.
  • The master partitions D=(D_1,\cdots,D_P), and sends D_i to node i.
  • Every node i computes g(D_i), where g is somem function.
  • Finally, the master collects g(D_1),\cdots,g(D_P) and computes f(D)=h(g(D_1),\cdots,g(D_P)), where h is some function.

Challenges

Stragglers

  • Nodes that are significantly slower than the others.

Adversaries

  • Nodes that return errounous results.
    • Computation/communication errors.
    • Adversarial attacks.

Privary

  • Nodes may be curious about the dataset.

Resemblance to communication channel

Suppose f,g=\operatorname{id}, and let D=(D_1,\cdots,D_P)\in \mathbb{F}^p a message.

  • D_i is a field element
  • \mathbb{F} could be \mathbb{R} or \mathbb{C}, \mathbb{F}^q.

Observation: This is a distributed storage system.

  • An erasure - node that does not respond.
  • An error - node that returns errounous results.

Solution:

  • Add redundancy to the message
  • Error-correcting codes.

Coded Distributed Computing

  • The master partitions D and encodes it before sending to P workers.
  • Workers perform computations on coded data \tilde{D} and generate coded results g(\tilde{D}).
  • The master decode the coded results and obtain f(D)=h(g(\tilde{D})).

Outline

Matrix-Vector Multiplication

  • MDS codes.
  • Short-Dot codes.

Matrix-Matrix Multiplication

  • Polynomial codes.
  • MatDot codes.

Polynomial Evaluation

  • Lagrange codes.
  • Application to BLockchain.

Trivial solution - replication

Why no straggler tolerance?

  • We employ an individual worker node i to compute y_i=(a_i,\ldots,a_{iN})\cdot (x_1,\ldots,x_N)^T.

Replicate the computation?

  • Let r+1 nodes compute every y_i.

We need P=rM+M worker nodes to tolerate r erasures and \lfloor \frac{r}{2}\rfloor adversaries.

Use of MDS codes

Let 2|M and P=3.

Let A_1,A_2 be submatrices of A such that A=[A_1^\top|A_2^\top]^\top.

  • Worker node 1 conputes A_1\cdot x.
  • Worker node 2 conputes A_2\cdot x.
  • Worker node 3 conputes (A_1+A_2)\cdot x.

Observation: the results can be obtained from any two worker nodes.

Let G\in \mathbb{F}^{M\times P} be the generator matrix of an (P,M) MDS code.

The master node computes F=G^\top A\in \mathbb{F}^{P\times N}.

Every worker node i computers F_i\cdot x.

  • F_i=(G^\top A)_i is the i-th row of G^\top A.

Notice that Fx=G^\top A\cdot x=G^\top y is the codeword of y.

Node i computes an entry in this codeword.

1 response = 1 entyr of the codeword.

The master does not need all workers to respond to obtain y.

  • The MDS property allows decoding from any M $y_i$'s
  • This scheme tolerates P-M erasures, and the recovery threshold K=M.
  • We need P=r+M worker nodes to tolerate r stragglers or \frac{r}{2} adversaries.
    • With replication, we need P=rM+M worker nodes.

Potential improvements for MDS codes

  • The matrix A is usually a (trained) model, and x is the data (feature vector).
  • x is transmitted frequently, while the row of A (or G^\top A) is communicated in advance.
  • Every worker needs to receive the entire x and compute the dot product.
  • Communication-heavy
  • Can we design a scheme that allows every node only receive only a part of x?

Short-Dot codes

link to paper

We want to create a matrix F\in \mathbb{F}^{P\times M} from A such that:

  • Every node computes F_i\cdot x.
  • Every K rows linearly span the row space of A.
  • Each row of F contains at most s non-zero entries.

In the MDS method, F=G^\top A.

  • The recovery threshold K=M.
  • Every worker node needs to receive s=N symbols (the entire x)

No free lunch

Can we trade the recovery threshold K for a smaller s?

  • Every worker node receives less than N symbols.
  • The master will need more than M responses to recover the computation result.

Construction of Short-Dot codes

Choose a super-regular matrix B\in \mathbb{F}^{P\times K}, where P is the number of worker nodes.

  • A matrix is supper-regular if every square submatrix is invertible.
  • Lagrange/Cauchy matrix is super-regular (next lecture).

Create matrix \tilde{A} by stacking some Z\in \mathbb{F}^{(K-M)\times N} below matrix A.

Let F=B\cdot \tilde{A}\in \mathbb{F}^{P\times N}.

Short-Dot: create matrix F\in \mathbb{F}^{P\times N} such that:

  • Every K rows linearly span the row space of A.
  • Each row of F contains at most s=\frac{P-K+M}{P}. N non-zero entries (sparse).

Recovery of Short-Dot codes

Claim: Every K rows of F linearly span the row space of A.

Proof

Since B is supper-regular, it is also MDS, i.e., every K\times K submatrix of B is invertible.

Hence, every row of A can be represented as a linear combination of any K rows of F.

That is, for every \mathcal{X}\subseteq[P],|\mathcal{X}|=K, we can have \tilde{A}=(B^{\mathcal{X}})^{-1}F^{\mathcal{X}}.

What about the sparsity of F?

  • Want each row of F to be sparse.

Sparsity of Short-Dot codes

Build P\times P square matrix whose each row/column contains P-K+M non-zero entries.

Concatenate \frac{N}{P} such matrices and obtain

[Missing slides 18]

We now investigate what 𝑍 should look like to construct such a matrix 𝐹. • Recall that each column of 𝐹 must contains 𝐾 𝑀 zeros. They are indexed by the set 𝒰𝑃 , where |𝒰| = 𝐾 𝑀. Let 𝐵 𝒰𝔽 𝐾𝑀 ×𝐾 be a submatrix of 𝐵 containing rows indexed by 𝒰. • Since 𝐹 = 𝐵𝐴ሚ , it follows that 𝐹𝑗 = 𝐵𝐴ሚ 𝑗 , where 𝐹𝑗 and 𝐴ሚ 𝑗 are the 𝑗-th column of 𝐹 and 𝐴ሚ. • Next, we have 𝐵 𝒰𝐴ሚ 𝑗 = 0 (𝐾𝑀)×1. • Split 𝐵 𝒰 = [𝐵 1,𝑀 𝒰 ,𝐵[𝑀+1,𝐾] 𝒰 ], 𝐴ሚ 𝑗 = 𝐴𝑗 𝑇 , 𝑍𝑗 𝑇 𝑇 . • 𝐵 𝒰𝐴ሚ 𝑗 = 𝐵 1,𝑀 𝒰 𝐴𝑗 + 𝐵[𝑀+1,𝐾] 𝒰 𝑍𝑗 = 0 (𝐾𝑀)×1. • 𝑍𝑗= (𝐵 𝑀+1,𝐾 𝒰 ) 1 𝐵 1,𝑀 𝒰 𝐴𝑗 . • Note that 𝐵 𝑀+1,𝐾 𝒰𝔽 𝐾𝑀 × 𝐾𝑀 is invertible. Since 𝐵 is super-regular.