Files
NoteNextra-origin/content/CSE5313/CSE5313_L25.md
Trance-0 55decba076
Some checks failed
Sync from Gitea (main→main, keep workflow) / mirror (push) Has been cancelled
LaTeX
2025-12-01 15:40:03 -06:00

11 KiB
Raw Permalink Blame History

CSE5313 Coding and information theory for data science (Lecture 25)

Polynomial Evaluation

Problem formulation:

  • We have K datasets X_1,X_2,\ldots,X_K.
  • Want to compute some polynomial function f of degree d on each dataset.
    • Want f(X_1),f(X_2),\ldots,f(X_K).
  • Examples:
    • X_1,X_2,\ldots,X_K are points in \mathbb{F}^{M\times M}, and f(X)=X^8+3X^2+1.
    • X_k=(X_k^{(1)},X_k^{(2)}), both in \mathbb{F}^{M\times M}, and f(X)=X_k^{(1)}X_k^{(2)}.
    • Gradient computation.

P worker nodes:

  • Some are stragglers, i.e., not responsive.
  • Some are adversaries, i.e., return erroneous results.
  • Privacy: We do not want to expose datasets to worker nodes.

Lagrange Coded Computing

Let \ell(z) be a polynomial whose evaluations at \omega_1,\ldots,\omega_{K} are X_1,\ldots,X_K.

  • That is, \ell(\omega_i)=X_i for every \omega_i\in \mathbb{F}, i\in [K].

Some example constructions:

Given X_1,\ldots,X_K with corresponding \omega_1,\ldots,\omega_K

  • \ell(z)=\sum_{i=1}^K X_i\ell_i(z), where \ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}.

Then every f(X_i)=f(\ell(\omega_i)) is an evaluation of polynomial f\circ \ell(z) at \omega_i.

If the master obtains the composition h=f\circ \ell, it can obtain every f(X_i)=h(\omega_i).

Goal: The master wished to obtain the polynomial h(z)=f(\ell(z)).

Intuition:

  • Encoding is performed by evaluating \ell(z) at \alpha_1,\ldots,\alpha_P\in \mathbb{F}, and P>K for redundancy.
  • Nodes apply f on an evaluation of \ell and obtain an evaluation of h.
  • The master receives some potentially noisy evaluations, and finds h.
  • The master evaluates h at \omega_1,\ldots,\omega_K to obtain f(X_1),\ldots,f(X_K).

Encoding for Lagrange coded computing

Need polynomial \ell(z) such that:

  • X_k=\ell(\omega_k) for every k\in [K].

Having obtained such \ell we let \tilde{X}_i=\ell(\alpha_i) for every i\in [P].

span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}.

Want X_k=\ell(\omega_k) for every k\in [K].

Tool: Lagrange interpolation.

  • \ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}.
  • \ell_k(\omega_k)=1 and \ell_k(\omega_k)=0 for every j\neq k.
  • \deg \ell_k(z)=K-1.

Let \ell(z)=\sum_{k=1}^K X_k\ell_k(z).

  • \deg \ell\leq K-1.
  • \ell(\omega_k)=X_k for every k\in [K].

Let \tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i).

Every \tilde{X}_i is a linear combination of X_1,\ldots,X_K.


(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix}
\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P)
\end{bmatrix}

This G is called a Lagrange matrix with respect to

  • \omega_1,\ldots,\omega_K. (interpolation points, rows)
  • \alpha_1,\ldots,\alpha_P. (evaluation points, columns)

Basically, a modification of Reed-Solomon code.

Decoding for Lagrange coded computing

Say the system has S stragglers (erasures) and A adversaries (errors).

The master receives P-S computation results f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}}).

  • By design, therese are evaluations of h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))
  • A evaluation are noisy
  • \deg h=\deg f\cdot \deg \ell=(K-1)\deg f.

Which process enables to interpolate a polynomial from noisy evaluations?

Ree-Solomon (RS) decoding.

Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 \times the number of errors \leq d-1.

Imagine h as the "message" in Reed-Solomon code. [P,(K-1)\deg f +1,P-(K-1)\deg f]_q.

  • Interpolating h is possible if and only if S+2A\leq (K-1)\deg f-1.

Once the master interpolates h.

  • The evaluations h(\omega_i)=f(\ell(\omega_i))=f(X_i) provides the interpolation results.

Theorem of Lagrange coded computing

Lagrange coded computing enables to compute \{f(X_i)\}_{i=1}^K for any f at the presence of at most S stragglers and at most A adversaries if


(K-1)\deg f+S+2A+1\leq P

Interpolation of result does not depend on P (number of worker nodes).

Privacy for Lagrange coded computing

Currently any size-K group of colluding nodes reveals the entire dataset.

Q: Can an individual node i learn anything about X_i?

A: Yes, since \tilde{X}_i is a linear combination of X_1,\ldots,X_K (partial knowledge, a linear combination of private data).

Can we provide perfect privacy given that at most T nodes collude?

  • That is, I(X:\tilde{X}_i)=0 for every \mathcal{T}\subseteq [P] of size at most T, where
    • X=(X_1,\ldots,X_K), and
    • \tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}}).

Solution: Slight change of encoding in LLC.

This only applied to \mathbb{F}=\mathbb{F}_q (no perfect privacy over \mathbb{R},\mathbb{C}. No uniform distribution can be defined).

The master chooses

  • T keys Z_{K+1},\ldots,Z_{K+T} uniformly at random (|Z_i|=|X_i| for all i)
  • Interpolation points \omega_1,\ldots,\omega_{K+T}.

Find the Lagrange polynomial \ell(z) such that

  • \ell(w_i)=X_i for i\in [K]
  • \ell(w_{K+j})=Z_j for j\in [T].

Lagrange interpolation:


\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z)

(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G

where


G=\begin{bmatrix}
\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P)
\end{bmatrix}

For analysis, we denote G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}, where G^{top}\in \mathbb{F}^{K\times P} and G^{bot}\in \mathbb{F}^{T\times P}.

The proof for privacy is the almost the same as ramp scheme.

Proof

We have (\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}.

Without loss of generality, \mathcal{T}=[T] is the colluding set.

\mathcal{T} hold (\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}.

  • G^{top}_\mathcal{T}, G^{bot}_\mathcal{T} contain the first T columns of G^{top}, G^{bot}, respectively.

Note that G^{top}\in \mathbb{F}^{T\times P}_q is MDS, and hence G^{top}_\mathcal{T} is a T\times T invertible matrix.

Since Z=(Z_1,\ldots,Z_T) chosen uniformly random, so Z\cdot G^{bot}_\mathcal{T} is a one-time pad.

Same proof for decoding, we only need K+1 item to make the interpolation work.

Conclusion

  • Theorem: Lagrange Coded Computing is resilient against S stragglers, A adversaries, and T colluding nodes if
    
    P\geq (K+T-1)\deg f+S+2A+1
    
    • Privacy (increase with \deg f) cost more than the straggler and adversary (increase linearly).
  • Caveat: Requires finite field arithmetic!
  • Some follow-up works analyzed information leakage over the reals

Side note for Blockchain

Blockchain: A decentralized system for trust management.

Blockchain maintains a chain of blocks.

  • A block contains a set of transactions.
  • Transaction = value transfer between clients.
  • The chain is replicated on each node.

Periodically, a new block is proposed and appended to each local chain.

  • The block must not contain invalid transactions.
  • Nodes must agree on proposed block.

Existing systems:

  • All nodes perform the same set of tasks.
  • Every node must receive every block.

Performance does not scale with number of node

Improving performance of blockchain

The performance of blockchain is inherently limited by its design.

  • All nodes perform the same set of tasks.
  • Every node must receive every block.

Idea: Combine blockchain with distributed computing.

  • Node tasks should complement each other.

Sharding (notion from databases):

  • Nodes are partitioned into groups of equal size.
  • Each group maintains a local chain.
  • More nodes, more groups, more transactions can be processed.
  • Better performance.

Security Problem

Biggest problem in blockchains: Adversarial (Byzantine) nodes.

  • Malicious actors wish to include invalid transactions.

Solution in traditional blockchains: Consensus mechanisms.

  • Algorithms for decentralized agreement.
  • Tolerates up to 1/3 Byzantine nodes.

Problem: Consensus conflicts with sharding.

  • Traditional consensus mechanisms tolerate \approx 1/3 Byzantine nodes.
  • If we partition P nodes into K groups, we can tolerate only P/3K node failures.
    • Down from P/3 in non-shared systems.

Goal: Solve the consensus problem in sharded systems.

Tool: Coded computing.

Problem formulation

At epoch t of a shared blockchain system, we have

  • K local chain Y_1^{t-1},\ldots, Y_K^{t-1}.
  • K new blocks X_1(t),\ldots,X_K(t).
  • A polynomial verification function f(X_k(t),Y_k^t), which validates X_k(t).
Proof

Balance check function f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t).

More commonly, a (polynomial) hash function. Used to:

  • Verify the sender's public key.
  • Verify the ownership of the transferred funds.

Need: Apply a polynomial functions on K datasets.

Lagrange coded computing!

Blockchain with Lagrange coded computing

At epoch t:

  • A leader is elected (using secure election mechanism).
  • The leader receives new blocks X_1(t),\ldots,X_K(t).
  • The leader disperses the encoded blocks \tilde{X}_1(t),\ldots,\tilde{X}_P(t) to nodes.
    • Needs secure information dispersal mechanisms.

Every node i\in [P]:

  • Locally stores a coded chain \tilde{Y}_i^t (encoded using LCC).
  • Receives \tilde{X}_i(t).
  • Computes f(\tilde{X}_i(t),\tilde{Y}_i^t) and sends to the leader.

The leader decodes to get \{f(X_i(t),Y_i^t)\}_{i=1}^K and disperse securely to nodes.

Node i appends coded block \tilde{X}_i(t) to coded chain \tilde{Y}_i^t (zeroing invalid transactions).

Guarantees security if P\geq (K+T-1)\deg f+S+2A+1.

  • A adversaries, degree d verification polynomial.

Sharding without sharding:

  • Computations are done on (coded) partial chains/blocks.
    • Good performance!
  • Since blocks/chains are coded, they are "dispersed" among many nodes.
    • Security problem in sharding solved!
  • Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
    • Reduced communication! (main bottleneck).

Novelties:

  • First decentralized verification system with less than size of blocks times the number of nodes communication.
  • Coded consensus Reach consensus on coded data.