From 2a094032b7a54795bef0acdd96daa499f3af410d Mon Sep 17 00:00:00 2001 From: Trance-0 <60459821+Trance-0@users.noreply.github.com> Date: Tue, 25 Nov 2025 12:46:41 -0600 Subject: [PATCH] update? --- content/CSE510/CSE510_L26.md | 185 +++++++++++++++++++ content/CSE5313/CSE5313_L24.md | 2 +- content/CSE5313/CSE5313_L25.md | 326 +++++++++++++++++++++++++++++++++ content/CSE5313/_meta.js | 1 + wrangler.toml | 15 ++ 5 files changed, 528 insertions(+), 1 deletion(-) create mode 100644 content/CSE5313/CSE5313_L25.md diff --git a/content/CSE510/CSE510_L26.md b/content/CSE510/CSE510_L26.md index 77c4190..6e05673 100644 --- a/content/CSE510/CSE510_L26.md +++ b/content/CSE510/CSE510_L26.md @@ -36,3 +36,188 @@ Here: $\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}$ represents the a $\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}$ represents the observation diversity. +### Summary + +- MARL plays a critical role for AI, but is at the early stage +- Value factorization enables scalable MARL + - Linear factorization sometimes is surprising effective + - Non-linear factorization shows promise in offline settings +- Parameter sharing plays an important role for deep MARL +- Diversity and dynamic parameter sharing can be critical for complex cooperative tasks + +## Challenges and open problems in DRL + +### Overview for Reinforcement Learning Algorithms + +Recall from lecture 2 + +Better sample efficiency to less sample efficiency: + +- Model-based +- Off-policy/Q-learning +- Actor-critic +- On-policy/Policy gradient +- Evolutionary/Gradient-free + +#### Model-Based + +- Learn the model of the world, then pan using the model +- Update model often +- Re-plan often + +#### Value-Based + +- Learn the state or state-action value +- Act by choosing best action in state +- Exploration is a necessary add-on + +#### Policy-based + +- Learn the stochastic policy function that maps state to action +- Act by sampling policy +- Exploration is baked in + +### Where we are? + +Deep RL has achieved impressive results in games, robotics, control, and decision systems. + +But it is still far from a general, reliable, and efficient learning paradigm. + +Today: what limits Deep RL, what's being worked on, and what's still open. + +### Outline of challenges + +- Offline RL +- Multi-Agent complexity +- Sample efficiency & data reuse +- Stability & reproducibility +- Generalization & distribution shift +- Scalable model-based RL +- Safety +- Theory gaps & evaluation + +### Sample inefficiency + +Model-free Deep RL often need million/billion of steps + +- Humans with 15-minute learning tend to outperform DDQN with 115 hours +- OpenAI Five for Dota 2: 180 years playing time per day + +Real-world systems can't afford this + +Root causes: high-variance gradients, weak priors, poor credit assignment. + +Open direction for sample efficiency + +- Better data reuse: off-policy learning & replay improvements +- Self-supervised representation learning for control (learning from interacting with the environment) +- Hybrid model-based/model-free approaches +- Transfer & pre-training on large datasets + - Knowledge driving-RL: leveraging pre-trained models + +#### Knowledge-Driven RL: Motivation + +Current LLMs are not good at decision making + +Pros: rich knowledge + +Cons: Auto-regressive decoding lack of long turn memory + +Reinforcement learning in decision making + +Pros: Go beyond human intelligence + +Cons: sample inefficiency + +### Instability & the Deadly triad + +Function approximation + boostraping + off-policy learning can diverge + +Even stable algorithms (PPO) can be unstable + +#### Open direction for Stability + +Better optimization landscapes + regularization + +Calibration/monitoring tools for RL training + +Architectures with built-in inductive biased (e.g., equivariance) + +### Reproducibility & Evaluation + +Results often depend on random seeds, codebase, and compute budget + +Benchmark can be overfit; comparisons apples-to-oranges + +Offline evaluation is especially tricky + +#### Toward Better Evaluation + +- Robustness checks and ablations +- Out-of-distribution test suites +- Realistic benchmarks beyond games (e.g., science and healthcare) + +### Generalization & Distribution Shift + +Policy overfit to training environments and fail under small challenges + +Sim-to-real gap, sensor noise, morphology changes, domain drift. + +Requires learning invariance and robust decision rules. + +#### Open direction for Generalization + +- Domain randomization + system identification +- Robust/ risk-sensitive RL +- Representation learning for invariance +- Meta-RL and fast adaptation + +### Model-based RL: Promise & Pitfalls + +- Learned models enable planning and sample efficiency +- But distribution mismatch and model exploitation can break policies +- Long-horizon imagination amplifies errors +- Model-learning is challenging + +### Safety, alignment, and constraints + +Reward mis-specification -> unsafe or unintended behavior + +Need to respect constraints: energy, collisions, ethics, regulation + +Exploration itself may be unsafe + +#### Open direction for Safety RL + +- Constraint RL (Lagrangians, CBFs, she) + +### Theory Gaps & Evaluation + +Deep RL lacks strong general guarantees. + +We don't fully understand when/why it works + +Bridging theory and + +#### Promising theory directoins + +Optimization thoery of RL objectives + +Generalization and representation learning bounds + +Finite-sample analysis s + +### Connection to foundation models + +- Pre-training on large scale experience +- World models as sequence predictors +- RLHF/preference optimization for alignment +- Open problems: groundign + +### What to expect in the next 3-5 years + +Unified model-based offline + safe RL stacks + +Large pretrianed decision models + +Deployment in high-stake domains \ No newline at end of file diff --git a/content/CSE5313/CSE5313_L24.md b/content/CSE5313/CSE5313_L24.md index 9b25a4a..a1241c2 100644 --- a/content/CSE5313/CSE5313_L24.md +++ b/content/CSE5313/CSE5313_L24.md @@ -383,7 +383,7 @@ $$ \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ \vdots & \vdots & \ddots & \vdots \\ -\ell_P(\alpha_1) & \ell_P(\alpha_2) & \cdots & \ell_P(\alpha_P) +\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \end{bmatrix} $$ diff --git a/content/CSE5313/CSE5313_L25.md b/content/CSE5313/CSE5313_L25.md new file mode 100644 index 0000000..d728aa5 --- /dev/null +++ b/content/CSE5313/CSE5313_L25.md @@ -0,0 +1,326 @@ +# CSE5313 Coding and information theory for data science (Lecture 25) + +## Polynomial Evaluation + +Problem formulation: + +- We have $K$ datasets $X_1,X_2,\ldots,X_K$. +- Want to compute some polynomial function $f$ of degree $d$ on each dataset. + - Want $f(X_1),f(X_2),\ldots,f(X_K)$. +- Examples: + - $X_1,X_2,\ldots,X_K$ are points in $\mathbb{F}^{M\times M}$, and $f(X)=X^8+3X^2+1$. + - $X_k=(X_k^{(1)},X_k^{(2)})$, both in $\mathbb{F}^{M\times M}$, and $f(X)=X_k^{(1)}X_k^{(2)}$. + - Gradient computation. + +$P$ worker nodes: + +- Some are stragglers, i.e., not responsive. +- Some are adversaries, i.e., return erroneous results. +- Privacy: We do not want to expose datasets to worker nodes. + +### Lagrange Coded Computing + +Let $\ell(z)$ be a polynomial whose evaluations at $\omega_1,\ldots,\omega_{K}$ are $X_1,\ldots,X_K$. + +- That is, $\ell(\omega_i)=X_i$ for every $\omega_i\in \mathbb{F}, i\in [K]$. + +Some example constructions: + +Given $X_1,\ldots,X_K$ with corresponding $\omega_1,\ldots,\omega_K$ + +- $\ell(z)=\sum_{i=1}^K X_i\ell_i(z)$, where $\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}$. + +Then every $f(X_i)=f(\ell(\omega_i))$ is an evaluation of polynomial $f\circ \ell(z)$ at $\omega_i$. + +If the master obtains the composition $h=f\circ \ell$, it can obtain every $f(X_i)=h(\omega_i)$. + +Goal: The master wished to obtain the polynomial $h(z)=f(\ell(z))$. + +Intuition: + +- Encoding is performed by evaluating $\ell(z)$ at $\alpha_1,\ldots,\alpha_P\in \mathbb{F}$, and $P>K$ for redundancy. +- Nodes apply $f$ on an evaluation of $\ell$ and obtain an evaluation of $h$. +- The master receives some potentially noisy evaluations, and finds $h$. +- The master evaluates $h$ at $\omega_1,\ldots,\omega_K$ to obtain $f(X_1),\ldots,f(X_K)$. + +### Encoding for Lagrange coded computing + +Need polynomial $\ell(z)$ such that: + +- $X_k=\ell(\omega_k)$ for every $k\in [K]$. + +Having obtained such $\ell$ we let $\tilde{X}_i=\ell(\alpha_i)$ for every $i\in [P]$. + +$span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}$. + +Want $X_k=\ell(\omega_k)$ for every $k\in [K]$. + +Tool: Lagrange interpolation. + +- $\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}$. +- $\ell_k(\omega_k)=1$ and $\ell_k(\omega_k)=0$ for every $j\neq k$. +- $\deg \ell_k(z)=K-1$. + +Let $\ell(z)=\sum_{k=1}^K X_k\ell_k(z)$. + +- $\deg \ell\leq K-1$. +- $\ell(\omega_k)=X_k$ for every $k\in [K]$. + +Let $\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i)$. + +Every $\tilde{X}_i$ is a **linear combination** of $X_1,\ldots,X_K$. + +$$ +(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix} +\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ +\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ +\vdots & \vdots & \ddots & \vdots \\ +\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) +\end{bmatrix} +$$ + +This $G$ is called a **Lagrange matrix** with respect to + +- $\omega_1,\ldots,\omega_K$. (interpolation points, rows) +- $\alpha_1,\ldots,\alpha_P$. (evaluation points, columns) + +> Basically, a modification of Reed-Solomon code. + +### Decoding for Lagrange coded computing + +Say the system has $S$ stragglers (erasures) and $A$ adversaries (errors). + +The master receives $P-S$ computation results $f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}})$. + +- By design, therese are evaluations of $h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))$ +- A evaluation are noisy +- $\deg h=\deg f\cdot \deg \ell=(K-1)\deg f$. + +Which process enables to interpolate a polynomial from noisy evaluations? + +Ree-Solomon (RS) decoding. + +Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 $\times$ the number of errors $\leq d-1$. + +Imagine $h$ as the "message" in Reed-Solomon code. $[P,(K-1)\deg f +1,P-(K-1)\deg f]_q$. + +- Interpolating $h$ is possible if and only if $S+2A\leq (K-1)\deg f-1$. + +Once the master interpolates $h$. + +- The evaluations $h(\omega_i)=f(\ell(\omega_i))=f(X_i)$ provides the interpolation results. + +#### Theorem of Lagrange coded computing + +Lagrange coded computing enables to compute $\{f(X_i)\}_{i=1}^K$ for any $f$ at the presence of at most $S$ stragglers and at most $A$ adversaries if + +$$ +(K-1)\deg f+S+2A+1\leq P +$$ + +> Interpolation of result does not depend on $P$ (number of worker nodes). + +### Privacy for Lagrange coded computing + +Currently any size-$K$ group of colluding nodes reveals the entire dataset. + +Q: Can an individual node $i$ learn anything about $X_i$? + +A: Yes, since $\tilde{X}_i$ is a linear combination of $X_1,\ldots,X_K$ (partial knowledge, a linear combination of private data). + +Can we provide **perfect privacy** given that at most $T$ nodes collude? + +- That is, $I(X:\tilde{X}_i)=0$ for every $\mathcal{T}\subseteq [P]$ of size at most $T$, where + - $X=(X_1,\ldots,X_K)$, and + - $\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}})$. + +Solution: Slight change of encoding in LLC. + +This only applied to $\mathbb{F}=\mathbb{F}_q$ (no perfect privacy over $\mathbb{R},\mathbb{C}$. No uniform distribution can be defined). + +The master chooses + +- $T$ keys $Z_{K+1},\ldots,Z_{K+T}$ uniformly at random ($|Z_i|=|X_i|$ for all $i$) +- Interpolation points $\omega_1,\ldots,\omega_{K+T}$. + +Find the Lagrange polynomial $\ell(z)$ such that + +- $\ell(w_i)=X_i$ for $i\in [K]$ +- $\ell(w_{K+j})=Z_j$ for $j\in [T]$. + +Lagrange interpolation: + +$$ +\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z) +$$ + +$$ +(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G +$$ + +where + +$$ +G=\begin{bmatrix} +\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\ +\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\ +\vdots & \vdots & \ddots & \vdots \\ +\ell_K)(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\ +\vdots & \vdots & \ddots & \vdots \\ +\ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P) +$$ + +For analysis, we denote $G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}$, where $G^{top}\in \mathbb{F}^{K\times P}$ and $G^{bot}\in \mathbb{F}^{T\times P}$. + +The proof for privacy is the almost the same as ramp scheme. + +
+Proof + +We have $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}$. + +Without loss of generality, $\mathcal{T}=[T]$ is the colluding set. + +$\mathcal{T}$ hold $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}$. + +- $G^{top}_\mathcal{T}$, $G^{bot}_\mathcal{T}$ contain the first $T$ columns of $G^{top}$, $G^{bot}$, respectively. + +Note that $G^{top}\in \mathbb{F}^{T\times P}_q$ is MDS, and hence $G^{top}_\mathcal{T}$ is a $T\times T$ invertible matrix. + +Since $Z=(Z_1,\ldots,Z_T)$ chosen uniformly random, so $Z\cdot G^{bot}_\mathcal{T}$ is a one-time pad. + +Same proof for decoding, we only need $K+1$ item to make the interpolation work. + +
+ +## Conclusion + +- Theorem: Lagrange Coded Computing is resilient against $S$ stragglers, $A$ adversaries, and $T$ colluding nodes if + $$ + P\geq (K+T-1)\deg f+S+2A+1 + $$ + - Privacy (increase with $\deg f$) cost more than the straggler and adversary (increase linearly). +- Caveat: Requires finite field arithmetic! +- Some follow-up works analyzed information leakage over the reals + +## Side note for Blockchain + +Blockchain: A decentralized system for trust management. + +Blockchain maintains a chain of blocks. + +- A block contains a set of transactions. +- Transaction = value transfer between clients. +- The chain is replicated on each node. + +Periodically, a new block is proposed and appended to each local chain. + +- The block must not contain invalid transactions. +- Nodes must agree on proposed block. + +Existing systems: + +- All nodes perform the same set of tasks. +- Every node must receive every block. + +Performance does not scale with number of node + +### Improving performance of blockchain + +The performance of blockchain is inherently limited by its design. + +- All nodes perform the same set of tasks. +- Every node must receive every block. + +Idea: Combine blockchain with distributed computing. + +- Node tasks should complement each other. + +Sharding (notion from databases): + +- Nodes are partitioned into groups of equal size. +- Each group maintains a local chain. +- More nodes, more groups, more transactions can be processed. +- Better performance. + +### Security Problem + +Biggest problem in blockchains: Adversarial (Byzantine) nodes. + +- Malicious actors wish to include invalid transactions. + +Solution in traditional blockchains: Consensus mechanisms. + +- Algorithms for decentralized agreement. +- Tolerates up to $1/3$ Byzantine nodes. + +Problem: Consensus conflicts with sharding. + +- Traditional consensus mechanisms tolerate $\approx 1/3$ Byzantine nodes. +- If we partition $P$ nodes into $K$ groups, we can tolerate only $P/3K$ node failures. + - Down from $P/3$ in non-shared systems. + +Goal: Solve the consensus problem in sharded systems. + +Tool: Coded computing. + +### Problem formulation + +At epoch $t$ of a shared blockchain system, we have + +- $K$ local chain $Y_1^{t-1},\ldots, Y_K^{t-1}$. +- $K$ new blocks $X_1(t),\ldots,X_K(t)$. +- A polynomial verification function $f(X_k(t),Y_k^t)$, which validates $X_k(t)$. + +
+Proof + +Balance check function $f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t)$. + +More commonly, a (polynomial) hash function. Used to: + +- Verify the sender's public key. +- Verify the ownership of the transferred funds. + +
+ +Need: Apply a polynomial functions on $K$ datasets. + +Lagrange coded computing! + +### Blockchain with Lagrange coded computing + +At epoch $t$: + +- A leader is elected (using secure election mechanism). +- The leader receives new blocks $X_1(t),\ldots,X_K(t)$. +- The leader disperses the encoded blocks $\tilde{X}_1(t),\ldots,\tilde{X}_P(t)$ to nodes. + - Needs secure information dispersal mechanisms. + +Every node $i\in [P]$: + +- Locally stores a coded chain $\tilde{Y}_i^t$ (encoded using LCC). +- Receives $\tilde{X}_i(t)$. +- Computes $f(\tilde{X}_i(t),\tilde{Y}_i^t)$ and sends to the leader. + +The leader decodes to get $\{f(X_i(t),Y_i^t)\}_{i=1}^K$ and disperse securely to nodes. + +Node $i$ appends coded block $\tilde{X}_i(t)$ to coded chain $\tilde{Y}_i^t$ (zeroing invalid transactions). + +Guarantees security if $P\geq (K+T-1)\deg f+S+2A+1$. + +- $A$ adversaries, degree $d$ verification polynomial. + +Sharding without sharding: + +- Computations are done on (coded) partial chains/blocks. + - Good performance! +- Since blocks/chains are coded, they are "dispersed" among many nodes. + - Security problem in sharding solved! +- Since the encoding is done (securely) through a leader, no need to send every block to all nodes. + - Reduced communication! (main bottleneck). + +Novelties: + +- First decentralized verification system with less than size of blocks times the number of nodes communication. +- Coded consensus – Reach consensus on coded data. diff --git a/content/CSE5313/_meta.js b/content/CSE5313/_meta.js index 9629909..909a951 100644 --- a/content/CSE5313/_meta.js +++ b/content/CSE5313/_meta.js @@ -28,4 +28,5 @@ export default { CSE5313_L22: "CSE5313 Coding and information theory for data science (Lecture 22)", CSE5313_L23: "CSE5313 Coding and information theory for data science (Lecture 23)", CSE5313_L24: "CSE5313 Coding and information theory for data science (Lecture 24)", + CSE5313_L25: "CSE5313 Coding and information theory for data science (Lecture 25)", } \ No newline at end of file diff --git a/wrangler.toml b/wrangler.toml index 38e73e2..2a6bb2f 100644 --- a/wrangler.toml +++ b/wrangler.toml @@ -3,3 +3,18 @@ pages_build_output_dir= ".vercel/output/static" name = "notenextra" compatibility_date = "2025-02-13" compatibility_flags = ["nodejs_compat"] + +[observability] +enabled = false +head_sampling_rate = 1 + +[observability.logs] +enabled = true +head_sampling_rate = 1 +persist = true +invocation_logs = true + +[observability.traces] +enabled = false +persist = true +head_sampling_rate = 1 \ No newline at end of file