# CSE5313 Coding and information theory for data science (Lecture 25) ## Polynomial Evaluation Problem formulation: - We have $K$ datasets $X_1,X_2,\ldots,X_K$. - Want to compute some polynomial function $f$ of degree $d$ on each dataset. - Want $f(X_1),f(X_2),\ldots,f(X_K)$. - Examples: - $X_1,X_2,\ldots,X_K$ are points in $\mathbb{F}^{M\times M}$, and $f(X)=X^8+3X^2+1$. - $X_k=(X_k^{(1)},X_k^{(2)})$, both in $\mathbb{F}^{M\times M}$, and $f(X)=X_k^{(1)}X_k^{(2)}$. - Gradient computation. $P$ worker nodes: - Some are stragglers, i.e., not responsive. - Some are adversaries, i.e., return erroneous results. - Privacy: We do not want to expose datasets to worker nodes. ### Lagrange Coded Computing Let $\ell(z)$ be a polynomial whose evaluations at $\omega_1,\ldots,\omega_{K}$ are $X_1,\ldots,X_K$. - That is, $\ell(\omega_i)=X_i$ for every $\omega_i\in \mathbb{F}, i\in [K]$. Some example constructions: Given $X_1,\ldots,X_K$ with corresponding $\omega_1,\ldots,\omega_K$ - $\ell(z)=\sum_{i=1}^K X_i\ell_i(z)$, where $\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}$. Then every $f(X_i)=f(\ell(\omega_i))$ is an evaluation of polynomial $f\circ \ell(z)$ at $\omega_i$. If the master obtains the composition $h=f\circ \ell$, it can obtain every $f(X_i)=h(\omega_i)$. Goal: The master wished to obtain the polynomial $h(z)=f(\ell(z))$. Intuition: - Encoding is performed by evaluating $\ell(z)$ at $\alpha_1,\ldots,\alpha_P\in \mathbb{F}$, and $P>K$ for redundancy. - Nodes apply $f$ on an evaluation of $\ell$ and obtain an evaluation of $h$. - The master receives some potentially noisy evaluations, and finds $h$. - The master evaluates $h$ at $\omega_1,\ldots,\omega_K$ to obtain $f(X_1),\ldots,f(X_K)$. ### Encoding for Lagrange coded computing Need polynomial $\ell(z)$ such that: - $X_k=\ell(\omega_k)$ for every $k\in [K]$. Having obtained such $\ell$ we let $\tilde{X}_i=\ell(\alpha_i)$ for every $i\in [P]$. $span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}$. Want $X_k=\ell(\omega_k)$ for every $k\in [K]$. Tool: Lagrange interpolation. - $\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}$. - $\ell_k(\omega_k)=1$ and $\ell_k(\omega_k)=0$ for every $j\neq k$. - $\deg \ell_k(z)=K-1$. Let $\ell(z)=\sum_{k=1}^K X_k\ell_k(z)$. - $\deg \ell\leq K-1$. - $\ell(\omega_k)=X_k$ for every $k\in [K]$. Let $\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i)$. Every $\tilde{X}_i$ is a **linear combination** of $X_1,\ldots,X_K$. $$ (\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \end{bmatrix} $$ This $G$ is called a **Lagrange matrix** with respect to - $\omega_1,\ldots,\omega_K$. (interpolation points, rows) - $\alpha_1,\ldots,\alpha_P$. (evaluation points, columns) > Basically, a modification of Reed-Solomon code. ### Decoding for Lagrange coded computing Say the system has $S$ stragglers (erasures) and $A$ adversaries (errors). The master receives $P-S$ computation results $f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}})$. - By design, therese are evaluations of $h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))$ - A evaluation are noisy - $\deg h=\deg f\cdot \deg \ell=(K-1)\deg f$. Which process enables to interpolate a polynomial from noisy evaluations? Ree-Solomon (RS) decoding. Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 $\times$ the number of errors $\leq d-1$. Imagine $h$ as the "message" in Reed-Solomon code. $[P,(K-1)\deg f +1,P-(K-1)\deg f]_q$. - Interpolating $h$ is possible if and only if $S+2A\leq (K-1)\deg f-1$. Once the master interpolates $h$. - The evaluations $h(\omega_i)=f(\ell(\omega_i))=f(X_i)$ provides the interpolation results. #### Theorem of Lagrange coded computing Lagrange coded computing enables to compute $\{f(X_i)\}_{i=1}^K$ for any $f$ at the presence of at most $S$ stragglers and at most $A$ adversaries if $$ (K-1)\deg f+S+2A+1\leq P $$ > Interpolation of result does not depend on $P$ (number of worker nodes). ### Privacy for Lagrange coded computing Currently any size-$K$ group of colluding nodes reveals the entire dataset. Q: Can an individual node $i$ learn anything about $X_i$? A: Yes, since $\tilde{X}_i$ is a linear combination of $X_1,\ldots,X_K$ (partial knowledge, a linear combination of private data). Can we provide **perfect privacy** given that at most $T$ nodes collude? - That is, $I(X:\tilde{X}_i)=0$ for every $\mathcal{T}\subseteq [P]$ of size at most $T$, where - $X=(X_1,\ldots,X_K)$, and - $\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}})$. Solution: Slight change of encoding in LLC. This only applied to $\mathbb{F}=\mathbb{F}_q$ (no perfect privacy over $\mathbb{R},\mathbb{C}$. No uniform distribution can be defined). The master chooses - $T$ keys $Z_{K+1},\ldots,Z_{K+T}$ uniformly at random ($|Z_i|=|X_i|$ for all $i$) - Interpolation points $\omega_1,\ldots,\omega_{K+T}$. Find the Lagrange polynomial $\ell(z)$ such that - $\ell(w_i)=X_i$ for $i\in [K]$ - $\ell(w_{K+j})=Z_j$ for $j\in [T]$. Lagrange interpolation: $$ \ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z) $$ $$ (\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G $$ where $$ G=\begin{bmatrix} \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ \ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P) \end{bmatrix} $$ For analysis, we denote $G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}$, where $G^{top}\in \mathbb{F}^{K\times P}$ and $G^{bot}\in \mathbb{F}^{T\times P}$. The proof for privacy is the almost the same as ramp scheme.
Proof We have $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}$. Without loss of generality, $\mathcal{T}=[T]$ is the colluding set. $\mathcal{T}$ hold $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}$. - $G^{top}_\mathcal{T}$, $G^{bot}_\mathcal{T}$ contain the first $T$ columns of $G^{top}$, $G^{bot}$, respectively. Note that $G^{top}\in \mathbb{F}^{T\times P}_q$ is MDS, and hence $G^{top}_\mathcal{T}$ is a $T\times T$ invertible matrix. Since $Z=(Z_1,\ldots,Z_T)$ chosen uniformly random, so $Z\cdot G^{bot}_\mathcal{T}$ is a one-time pad. Same proof for decoding, we only need $K+1$ item to make the interpolation work.
## Conclusion - Theorem: Lagrange Coded Computing is resilient against $S$ stragglers, $A$ adversaries, and $T$ colluding nodes if $$ P\geq (K+T-1)\deg f+S+2A+1 $$ - Privacy (increase with $\deg f$) cost more than the straggler and adversary (increase linearly). - Caveat: Requires finite field arithmetic! - Some follow-up works analyzed information leakage over the reals ## Side note for Blockchain Blockchain: A decentralized system for trust management. Blockchain maintains a chain of blocks. - A block contains a set of transactions. - Transaction = value transfer between clients. - The chain is replicated on each node. Periodically, a new block is proposed and appended to each local chain. - The block must not contain invalid transactions. - Nodes must agree on proposed block. Existing systems: - All nodes perform the same set of tasks. - Every node must receive every block. Performance does not scale with number of node ### Improving performance of blockchain The performance of blockchain is inherently limited by its design. - All nodes perform the same set of tasks. - Every node must receive every block. Idea: Combine blockchain with distributed computing. - Node tasks should complement each other. Sharding (notion from databases): - Nodes are partitioned into groups of equal size. - Each group maintains a local chain. - More nodes, more groups, more transactions can be processed. - Better performance. ### Security Problem Biggest problem in blockchains: Adversarial (Byzantine) nodes. - Malicious actors wish to include invalid transactions. Solution in traditional blockchains: Consensus mechanisms. - Algorithms for decentralized agreement. - Tolerates up to $1/3$ Byzantine nodes. Problem: Consensus conflicts with sharding. - Traditional consensus mechanisms tolerate $\approx 1/3$ Byzantine nodes. - If we partition $P$ nodes into $K$ groups, we can tolerate only $P/3K$ node failures. - Down from $P/3$ in non-shared systems. Goal: Solve the consensus problem in sharded systems. Tool: Coded computing. ### Problem formulation At epoch $t$ of a shared blockchain system, we have - $K$ local chain $Y_1^{t-1},\ldots, Y_K^{t-1}$. - $K$ new blocks $X_1(t),\ldots,X_K(t)$. - A polynomial verification function $f(X_k(t),Y_k^t)$, which validates $X_k(t)$.
Proof Balance check function $f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t)$. More commonly, a (polynomial) hash function. Used to: - Verify the sender's public key. - Verify the ownership of the transferred funds.
Need: Apply a polynomial functions on $K$ datasets. Lagrange coded computing! ### Blockchain with Lagrange coded computing At epoch $t$: - A leader is elected (using secure election mechanism). - The leader receives new blocks $X_1(t),\ldots,X_K(t)$. - The leader disperses the encoded blocks $\tilde{X}_1(t),\ldots,\tilde{X}_P(t)$ to nodes. - Needs secure information dispersal mechanisms. Every node $i\in [P]$: - Locally stores a coded chain $\tilde{Y}_i^t$ (encoded using LCC). - Receives $\tilde{X}_i(t)$. - Computes $f(\tilde{X}_i(t),\tilde{Y}_i^t)$ and sends to the leader. The leader decodes to get $\{f(X_i(t),Y_i^t)\}_{i=1}^K$ and disperse securely to nodes. Node $i$ appends coded block $\tilde{X}_i(t)$ to coded chain $\tilde{Y}_i^t$ (zeroing invalid transactions). Guarantees security if $P\geq (K+T-1)\deg f+S+2A+1$. - $A$ adversaries, degree $d$ verification polynomial. Sharding without sharding: - Computations are done on (coded) partial chains/blocks. - Good performance! - Since blocks/chains are coded, they are "dispersed" among many nodes. - Security problem in sharding solved! - Since the encoding is done (securely) through a leader, no need to send every block to all nodes. - Reduced communication! (main bottleneck). Novelties: - First decentralized verification system with less than size of blocks times the number of nodes communication. - Coded consensus – Reach consensus on coded data.