11 KiB
CSE5313 Coding and information theory for data science (Lecture 25)
Polynomial Evaluation
Problem formulation:
- We have
KdatasetsX_1,X_2,\ldots,X_K. - Want to compute some polynomial function
fof degreedon each dataset.- Want
f(X_1),f(X_2),\ldots,f(X_K).
- Want
- Examples:
X_1,X_2,\ldots,X_Kare points in\mathbb{F}^{M\times M}, andf(X)=X^8+3X^2+1.X_k=(X_k^{(1)},X_k^{(2)}), both in\mathbb{F}^{M\times M}, andf(X)=X_k^{(1)}X_k^{(2)}.- Gradient computation.
P worker nodes:
- Some are stragglers, i.e., not responsive.
- Some are adversaries, i.e., return erroneous results.
- Privacy: We do not want to expose datasets to worker nodes.
Lagrange Coded Computing
Let \ell(z) be a polynomial whose evaluations at \omega_1,\ldots,\omega_{K} are X_1,\ldots,X_K.
- That is,
\ell(\omega_i)=X_ifor every\omega_i\in \mathbb{F}, i\in [K].
Some example constructions:
Given X_1,\ldots,X_K with corresponding \omega_1,\ldots,\omega_K
\ell(z)=\sum_{i=1}^K X_i\ell_i(z), where\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}.
Then every f(X_i)=f(\ell(\omega_i)) is an evaluation of polynomial f\circ \ell(z) at \omega_i.
If the master obtains the composition h=f\circ \ell, it can obtain every f(X_i)=h(\omega_i).
Goal: The master wished to obtain the polynomial h(z)=f(\ell(z)).
Intuition:
- Encoding is performed by evaluating
\ell(z)at\alpha_1,\ldots,\alpha_P\in \mathbb{F}, andP>Kfor redundancy. - Nodes apply
fon an evaluation of\elland obtain an evaluation ofh. - The master receives some potentially noisy evaluations, and finds
h. - The master evaluates
hat\omega_1,\ldots,\omega_Kto obtainf(X_1),\ldots,f(X_K).
Encoding for Lagrange coded computing
Need polynomial \ell(z) such that:
X_k=\ell(\omega_k)for everyk\in [K].
Having obtained such \ell we let \tilde{X}_i=\ell(\alpha_i) for every i\in [P].
span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}.
Want X_k=\ell(\omega_k) for every k\in [K].
Tool: Lagrange interpolation.
\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}.\ell_k(\omega_k)=1and\ell_k(\omega_k)=0for everyj\neq k.\deg \ell_k(z)=K-1.
Let \ell(z)=\sum_{k=1}^K X_k\ell_k(z).
\deg \ell\leq K-1.\ell(\omega_k)=X_kfor everyk\in [K].
Let \tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i).
Every \tilde{X}_i is a linear combination of X_1,\ldots,X_K.
(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix}
\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P)
\end{bmatrix}
This G is called a Lagrange matrix with respect to
\omega_1,\ldots,\omega_K. (interpolation points, rows)\alpha_1,\ldots,\alpha_P. (evaluation points, columns)
Basically, a modification of Reed-Solomon code.
Decoding for Lagrange coded computing
Say the system has S stragglers (erasures) and A adversaries (errors).
The master receives P-S computation results f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}}).
- By design, therese are evaluations of
h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}})) - A evaluation are noisy
\deg h=\deg f\cdot \deg \ell=(K-1)\deg f.
Which process enables to interpolate a polynomial from noisy evaluations?
Ree-Solomon (RS) decoding.
Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 \times the number of errors \leq d-1.
Imagine h as the "message" in Reed-Solomon code. [P,(K-1)\deg f +1,P-(K-1)\deg f]_q.
- Interpolating
his possible if and only ifS+2A\leq (K-1)\deg f-1.
Once the master interpolates h.
- The evaluations
h(\omega_i)=f(\ell(\omega_i))=f(X_i)provides the interpolation results.
Theorem of Lagrange coded computing
Lagrange coded computing enables to compute \{f(X_i)\}_{i=1}^K for any f at the presence of at most S stragglers and at most A adversaries if
(K-1)\deg f+S+2A+1\leq P
Interpolation of result does not depend on
P(number of worker nodes).
Privacy for Lagrange coded computing
Currently any size-K group of colluding nodes reveals the entire dataset.
Q: Can an individual node i learn anything about X_i?
A: Yes, since \tilde{X}_i is a linear combination of X_1,\ldots,X_K (partial knowledge, a linear combination of private data).
Can we provide perfect privacy given that at most T nodes collude?
- That is,
I(X:\tilde{X}_i)=0for every\mathcal{T}\subseteq [P]of size at mostT, whereX=(X_1,\ldots,X_K), and\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}}).
Solution: Slight change of encoding in LLC.
This only applied to \mathbb{F}=\mathbb{F}_q (no perfect privacy over \mathbb{R},\mathbb{C}. No uniform distribution can be defined).
The master chooses
TkeysZ_{K+1},\ldots,Z_{K+T}uniformly at random (|Z_i|=|X_i|for alli)- Interpolation points
\omega_1,\ldots,\omega_{K+T}.
Find the Lagrange polynomial \ell(z) such that
\ell(w_i)=X_ifori\in [K]\ell(w_{K+j})=Z_jforj\in [T].
Lagrange interpolation:
\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z)
(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G
where
G=\begin{bmatrix}
\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_K)(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\
\vdots & \vdots & \ddots & \vdots \\
\ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P)
For analysis, we denote G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}, where G^{top}\in \mathbb{F}^{K\times P} and G^{bot}\in \mathbb{F}^{T\times P}.
The proof for privacy is the almost the same as ramp scheme.
Proof
We have (\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}.
Without loss of generality, \mathcal{T}=[T] is the colluding set.
\mathcal{T} hold (\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}.
G^{top}_\mathcal{T},G^{bot}_\mathcal{T}contain the firstTcolumns ofG^{top},G^{bot}, respectively.
Note that G^{top}\in \mathbb{F}^{T\times P}_q is MDS, and hence G^{top}_\mathcal{T} is a T\times T invertible matrix.
Since Z=(Z_1,\ldots,Z_T) chosen uniformly random, so Z\cdot G^{bot}_\mathcal{T} is a one-time pad.
Same proof for decoding, we only need K+1 item to make the interpolation work.
Conclusion
- Theorem: Lagrange Coded Computing is resilient against
Sstragglers,Aadversaries, andTcolluding nodes ifP\geq (K+T-1)\deg f+S+2A+1- Privacy (increase with
\deg f) cost more than the straggler and adversary (increase linearly).
- Privacy (increase with
- Caveat: Requires finite field arithmetic!
- Some follow-up works analyzed information leakage over the reals
Side note for Blockchain
Blockchain: A decentralized system for trust management.
Blockchain maintains a chain of blocks.
- A block contains a set of transactions.
- Transaction = value transfer between clients.
- The chain is replicated on each node.
Periodically, a new block is proposed and appended to each local chain.
- The block must not contain invalid transactions.
- Nodes must agree on proposed block.
Existing systems:
- All nodes perform the same set of tasks.
- Every node must receive every block.
Performance does not scale with number of node
Improving performance of blockchain
The performance of blockchain is inherently limited by its design.
- All nodes perform the same set of tasks.
- Every node must receive every block.
Idea: Combine blockchain with distributed computing.
- Node tasks should complement each other.
Sharding (notion from databases):
- Nodes are partitioned into groups of equal size.
- Each group maintains a local chain.
- More nodes, more groups, more transactions can be processed.
- Better performance.
Security Problem
Biggest problem in blockchains: Adversarial (Byzantine) nodes.
- Malicious actors wish to include invalid transactions.
Solution in traditional blockchains: Consensus mechanisms.
- Algorithms for decentralized agreement.
- Tolerates up to
1/3Byzantine nodes.
Problem: Consensus conflicts with sharding.
- Traditional consensus mechanisms tolerate
\approx 1/3Byzantine nodes. - If we partition
Pnodes intoKgroups, we can tolerate onlyP/3Knode failures.- Down from
P/3in non-shared systems.
- Down from
Goal: Solve the consensus problem in sharded systems.
Tool: Coded computing.
Problem formulation
At epoch t of a shared blockchain system, we have
Klocal chainY_1^{t-1},\ldots, Y_K^{t-1}.Knew blocksX_1(t),\ldots,X_K(t).- A polynomial verification function
f(X_k(t),Y_k^t), which validatesX_k(t).
Proof
Balance check function f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t).
More commonly, a (polynomial) hash function. Used to:
- Verify the sender's public key.
- Verify the ownership of the transferred funds.
Need: Apply a polynomial functions on K datasets.
Lagrange coded computing!
Blockchain with Lagrange coded computing
At epoch t:
- A leader is elected (using secure election mechanism).
- The leader receives new blocks
X_1(t),\ldots,X_K(t). - The leader disperses the encoded blocks
\tilde{X}_1(t),\ldots,\tilde{X}_P(t)to nodes.- Needs secure information dispersal mechanisms.
Every node i\in [P]:
- Locally stores a coded chain
\tilde{Y}_i^t(encoded using LCC). - Receives
\tilde{X}_i(t). - Computes
f(\tilde{X}_i(t),\tilde{Y}_i^t)and sends to the leader.
The leader decodes to get \{f(X_i(t),Y_i^t)\}_{i=1}^K and disperse securely to nodes.
Node i appends coded block \tilde{X}_i(t) to coded chain \tilde{Y}_i^t (zeroing invalid transactions).
Guarantees security if P\geq (K+T-1)\deg f+S+2A+1.
Aadversaries, degreedverification polynomial.
Sharding without sharding:
- Computations are done on (coded) partial chains/blocks.
- Good performance!
- Since blocks/chains are coded, they are "dispersed" among many nodes.
- Security problem in sharding solved!
- Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
- Reduced communication! (main bottleneck).
Novelties:
- First decentralized verification system with less than size of blocks times the number of nodes communication.
- Coded consensus – Reach consensus on coded data.