update?

2025-11-25 12:46:41 -06:00
parent 34182ff139
commit 2a094032b7
5 changed files with 528 additions and 1 deletions
--- a/content/CSE510/CSE510_L26.md
+++ b/content/CSE510/CSE510_L26.md
@@ -36,3 +36,188 @@ Here: $\sum_{t=0}^{T-1}\log\frac{a_t|\tau_t,id}{p(a_t|\tau_t)}$ represents the a

 $\log \frac{p(o_{t+1}|\tau_t,a_t,id)}{p(o_{t+1}|\tau_t,a_t)}$ represents the observation diversity.

+### Summary
+
+- MARL plays a critical role for AI, but is at the early stage
+- Value factorization enables scalable MARL
+  - Linear factorization sometimes is surprising effective
+  - Non-linear factorization shows promise in offline settings
+- Parameter sharing plays an important role for deep MARL
+- Diversity and dynamic parameter sharing can be critical for complex cooperative tasks
+
+## Challenges and open problems in DRL
+
+### Overview for Reinforcement Learning Algorithms
+
+Recall from lecture 2
+
+Better sample efficiency to less sample efficiency:
+
+- Model-based
+- Off-policy/Q-learning
+- Actor-critic
+- On-policy/Policy gradient
+- Evolutionary/Gradient-free
+
+#### Model-Based
+
+- Learn the model of the world, then pan using the model
+- Update model often
+- Re-plan often
+
+#### Value-Based
+
+- Learn the state or state-action value
+- Act by choosing best action in state
+- Exploration is a necessary add-on
+
+#### Policy-based
+
+- Learn the stochastic policy function that maps state to action
+- Act by sampling policy
+- Exploration is baked in
+
+### Where we are?
+
+Deep RL has achieved impressive results in games, robotics, control, and decision systems.
+
+But it is still far from a general, reliable, and efficient learning paradigm.
+
+Today: what limits Deep RL, what's being worked on, and what's still open.
+
+### Outline of challenges
+
+- Offline RL
+- Multi-Agent complexity
+- Sample efficiency & data reuse
+- Stability & reproducibility
+- Generalization & distribution shift
+- Scalable model-based RL
+- Safety
+- Theory gaps & evaluation
+
+### Sample inefficiency
+
+Model-free Deep RL often need million/billion of steps
+
+- Humans with 15-minute learning tend to outperform DDQN with 115 hours
+- OpenAI Five for Dota 2: 180 years playing time per day
+
+Real-world systems can't afford this
+
+Root causes: high-variance gradients, weak priors, poor credit assignment.
+
+Open direction for sample efficiency
+
+- Better data reuse: off-policy learning & replay improvements
+- Self-supervised representation learning for control (learning from interacting with the environment)
+- Hybrid model-based/model-free approaches
+- Transfer & pre-training on large datasets
+  - Knowledge driving-RL: leveraging pre-trained models
+
+#### Knowledge-Driven RL: Motivation
+
+Current LLMs are not good at decision making
+
+Pros: rich knowledge
+
+Cons: Auto-regressive decoding lack of long turn memory
+
+Reinforcement learning in decision making
+
+Pros: Go beyond human intelligence
+
+Cons: sample inefficiency
+
+### Instability & the Deadly triad
+
+Function approximation + boostraping + off-policy learning can diverge
+
+Even stable algorithms (PPO) can be unstable
+
+#### Open direction for Stability
+
+Better optimization landscapes + regularization
+
+Calibration/monitoring tools for RL training
+
+Architectures with built-in inductive biased (e.g., equivariance)
+
+### Reproducibility & Evaluation
+
+Results often depend on random seeds, codebase, and compute budget
+
+Benchmark can be overfit; comparisons apples-to-oranges
+
+Offline evaluation is especially tricky
+
+#### Toward Better Evaluation
+
+- Robustness checks and ablations
+- Out-of-distribution test suites
+- Realistic benchmarks beyond games (e.g., science and healthcare)
+
+### Generalization & Distribution Shift
+
+Policy overfit to training environments and fail under small challenges
+
+Sim-to-real gap, sensor noise, morphology changes, domain drift.
+
+Requires learning invariance and robust decision rules.
+
+#### Open direction for Generalization
+
+- Domain randomization + system identification
+- Robust/ risk-sensitive RL
+- Representation learning for invariance
+- Meta-RL and fast adaptation
+
+### Model-based RL: Promise & Pitfalls
+
+- Learned models enable planning and sample efficiency
+- But distribution mismatch and model exploitation can break policies
+- Long-horizon imagination amplifies errors
+- Model-learning is challenging
+
+### Safety, alignment, and constraints
+
+Reward mis-specification -> unsafe or unintended behavior
+
+Need to respect constraints: energy, collisions, ethics, regulation
+
+Exploration itself may be unsafe
+
+#### Open direction for Safety RL
+
+- Constraint RL (Lagrangians, CBFs, she)
+
+### Theory Gaps & Evaluation
+
+Deep RL lacks strong general guarantees.
+
+We don't fully understand when/why it works
+
+Bridging theory and 
+
+#### Promising theory directoins
+
+Optimization thoery of RL objectives
+
+Generalization and representation learning bounds
+
+Finite-sample analysis s
+
+### Connection to foundation models
+
+- Pre-training on large scale experience
+- World models as sequence predictors
+- RLHF/preference optimization for alignment
+- Open problems: groundign 
+
+### What to expect in the next 3-5 years
+
+Unified model-based offline + safe RL stacks
+
+Large pretrianed decision models
+
+Deployment in high-stake domains
--- a/content/CSE5313/CSE5313_L24.md
+++ b/content/CSE5313/CSE5313_L24.md
@@ -383,7 +383,7 @@ $$
 \ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
 \ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
 \vdots & \vdots & \ddots & \vdots \\
-\ell_P(\alpha_1) & \ell_P(\alpha_2) & \cdots & \ell_P(\alpha_P)
+\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P)
 \end{bmatrix}
 $$

--- a/content/CSE5313/CSE5313_L25.md
+++ b/content/CSE5313/CSE5313_L25.md
@@ -0,0 +1,326 @@
+# CSE5313 Coding and information theory for data science (Lecture 25)
+
+## Polynomial Evaluation
+
+Problem formulation:
+
+- We have $K$ datasets $X_1,X_2,\ldots,X_K$.
+- Want to compute some polynomial function $f$ of degree $d$ on each dataset.
+  - Want $f(X_1),f(X_2),\ldots,f(X_K)$.
+- Examples:
+  - $X_1,X_2,\ldots,X_K$ are points in $\mathbb{F}^{M\times M}$, and $f(X)=X^8+3X^2+1$.
+  - $X_k=(X_k^{(1)},X_k^{(2)})$, both in $\mathbb{F}^{M\times M}$, and $f(X)=X_k^{(1)}X_k^{(2)}$.
+  - Gradient computation.
+
+$P$ worker nodes:
+
+- Some are stragglers, i.e., not responsive.
+- Some are adversaries, i.e., return erroneous results.
+- Privacy: We do not want to expose datasets to worker nodes.
+
+### Lagrange Coded Computing
+
+Let $\ell(z)$ be a polynomial whose evaluations at $\omega_1,\ldots,\omega_{K}$ are $X_1,\ldots,X_K$.
+
+- That is, $\ell(\omega_i)=X_i$ for every $\omega_i\in \mathbb{F}, i\in [K]$.
+
+Some example constructions:
+
+Given $X_1,\ldots,X_K$ with corresponding $\omega_1,\ldots,\omega_K$
+
+- $\ell(z)=\sum_{i=1}^K X_i\ell_i(z)$, where $\ell_i(z)=\prod_{j\in[K],j\neq i} \frac{z-\omega_j}{\omega_i-\omega_j}=\begin{cases} 0 & \text{if } j\neq i \\ 1 & \text{if } j=i \end{cases}$.
+
+Then every $f(X_i)=f(\ell(\omega_i))$ is an evaluation of polynomial $f\circ \ell(z)$ at $\omega_i$.
+
+If the master obtains the composition $h=f\circ \ell$, it can obtain every $f(X_i)=h(\omega_i)$.
+
+Goal: The master wished to obtain the polynomial $h(z)=f(\ell(z))$.
+
+Intuition:
+
+- Encoding is performed by evaluating $\ell(z)$ at $\alpha_1,\ldots,\alpha_P\in \mathbb{F}$, and $P>K$ for redundancy.
+- Nodes apply $f$ on an evaluation of $\ell$ and obtain an evaluation of $h$.
+- The master receives some potentially noisy evaluations, and finds $h$.
+- The master evaluates $h$ at $\omega_1,\ldots,\omega_K$ to obtain $f(X_1),\ldots,f(X_K)$.
+
+### Encoding for Lagrange coded computing
+
+Need polynomial $\ell(z)$ such that:
+
+- $X_k=\ell(\omega_k)$ for every $k\in [K]$.
+
+Having obtained such $\ell$ we let $\tilde{X}_i=\ell(\alpha_i)$ for every $i\in [P]$.
+
+$span{\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P}=span{\ell_1(x),\ell_2(x),\ldots,\ell_P(x)}$.
+
+Want $X_k=\ell(\omega_k)$ for every $k\in [K]$.
+
+Tool: Lagrange interpolation.
+
+- $\ell_k(z)=\prod_{i\neq k} \frac{z-\omega_j}{\omega_k-\omega_j}$.
+- $\ell_k(\omega_k)=1$ and $\ell_k(\omega_k)=0$ for every $j\neq k$.
+- $\deg \ell_k(z)=K-1$.
+
+Let $\ell(z)=\sum_{k=1}^K X_k\ell_k(z)$.
+
+- $\deg \ell\leq K-1$.
+- $\ell(\omega_k)=X_k$ for every $k\in [K]$.
+
+Let $\tilde{X}_i=\ell(\alpha_i)=\sum_{k=1}^K X_k\ell_k(\alpha_i)$.
+
+Every $\tilde{X}_i$ is a **linear combination** of $X_1,\ldots,X_K$.
+
+$$
+(\tilde{X}_1,\tilde{X}_2,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K)\cdot G=(X_1,\ldots,X_K)\begin{bmatrix}
+\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
+\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
+\vdots & \vdots & \ddots & \vdots \\
+\ell_K(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P)
+\end{bmatrix}
+$$
+
+This $G$ is called a **Lagrange matrix** with respect to 
+
+- $\omega_1,\ldots,\omega_K$. (interpolation points, rows)
+- $\alpha_1,\ldots,\alpha_P$. (evaluation points, columns)
+
+> Basically, a modification of Reed-Solomon code.
+
+### Decoding for Lagrange coded computing
+
+Say the system has $S$ stragglers (erasures) and $A$ adversaries (errors).
+
+The master receives $P-S$ computation results $f(\tilde{X}_{i_1}),\ldots,f(\tilde{X}_{i_{P-S}})$.
+
+- By design, therese are evaluations of $h: h(a_{i_1})=f(\ell(a_{i_1})),\ldots,h(a_{i_{P-S}})=f(\ell(a_{i_{P-S}}))$
+- A evaluation are noisy
+- $\deg h=\deg f\cdot \deg \ell=(K-1)\deg f$.
+
+Which process enables to interpolate a polynomial from noisy evaluations?
+
+Ree-Solomon (RS) decoding.
+
+Fact: Reed-Solomon decoding succeeds if and only if the number of erasures + 2 $\times$ the number of errors $\leq d-1$.
+
+Imagine $h$ as the "message" in Reed-Solomon code. $[P,(K-1)\deg f +1,P-(K-1)\deg f]_q$.
+
+- Interpolating $h$ is possible if and only if $S+2A\leq (K-1)\deg f-1$.
+
+Once the master interpolates $h$.
+
+- The evaluations $h(\omega_i)=f(\ell(\omega_i))=f(X_i)$ provides the interpolation results.
+
+#### Theorem of Lagrange coded computing
+
+Lagrange coded computing enables to compute $\{f(X_i)\}_{i=1}^K$ for any $f$ at the presence of at most $S$ stragglers and at most $A$ adversaries if
+
+$$
+(K-1)\deg f+S+2A+1\leq P
+$$
+
+> Interpolation of result does not depend on $P$ (number of worker nodes).
+
+### Privacy for Lagrange coded computing
+
+Currently any size-$K$ group of colluding nodes reveals the entire dataset.
+
+Q: Can an individual node $i$ learn anything about $X_i$?
+
+A: Yes, since $\tilde{X}_i$ is a linear combination of $X_1,\ldots,X_K$ (partial knowledge, a linear combination of private data).
+
+Can we provide **perfect privacy** given that at most $T$ nodes collude?
+
+- That is, $I(X:\tilde{X}_i)=0$ for every $\mathcal{T}\subseteq [P]$ of size at most $T$, where
+  - $X=(X_1,\ldots,X_K)$, and 
+  - $\tilde{X}_\mathcal{T}=(\tilde{X}_{i_1},\ldots,\tilde{X}_{i_{|\mathcal{T}|}})$.
+
+Solution: Slight change of encoding in LLC.
+
+This only applied to $\mathbb{F}=\mathbb{F}_q$ (no perfect privacy over $\mathbb{R},\mathbb{C}$. No uniform distribution can be defined).
+
+The master chooses
+
+- $T$ keys $Z_{K+1},\ldots,Z_{K+T}$ uniformly at random ($|Z_i|=|X_i|$ for all $i$)
+- Interpolation points $\omega_1,\ldots,\omega_{K+T}$.
+
+Find the Lagrange polynomial $\ell(z)$ such that
+
+- $\ell(w_i)=X_i$ for $i\in [K]$
+- $\ell(w_{K+j})=Z_j$ for $j\in [T]$.
+
+Lagrange interpolation:
+
+$$
+\ell(z)=\sum_{i=1}^{K} X_i\ell_i(z)+\sum_{j=1}^{T} X_{K+j}ell_{K+j}(z)
+$$
+
+$$
+(\tilde{X}_1,\ldots,\tilde{X}_P)=(X_1,\ldots,X_K,Z_1,\ldots,Z_T)\cdot G
+$$
+
+where
+
+$$
+G=\begin{bmatrix}
+\ell_1(\alpha_1) & \ell_1(\alpha_2) & \cdots & \ell_1(\alpha_P) \\
+\ell_2(\alpha_1) & \ell_2(\alpha_2) & \cdots & \ell_2(\alpha_P) \\
+\vdots & \vdots & \ddots & \vdots \\
+\ell_K)(\alpha_1) & \ell_K(\alpha_2) & \cdots & \ell_K(\alpha_P) \\
+\vdots & \vdots & \ddots & \vdots \\
+\ell_{K+T}(\alpha_1) & \ell_{K+T}(\alpha_2) & \cdots & \ell_{K+T}(\alpha_P)
+$$
+
+For analysis, we denote $G=\begin{bmatrix}G^{top}\\G^{bot}\end{bmatrix}$, where $G^{top}\in \mathbb{F}^{K\times P}$ and $G^{bot}\in \mathbb{F}^{T\times P}$.
+
+The proof for privacy is the almost the same as ramp scheme.
+
+<details>
+<summary>Proof</summary>
+
+We have $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}+(Z_1,\ldots,Z_T)\cdot G^{bot}$.
+
+Without loss of generality, $\mathcal{T}=[T]$ is the colluding set.
+
+$\mathcal{T}$ hold $(\tilde{X}_1,\ldots \tilde{X}_P)=(X_1,\ldots,X_K)\cdot G^{top}_\mathcal{T}+(Z_1,\ldots,Z_T)\cdot G^{bot}_\mathcal{top}$.
+
+- $G^{top}_\mathcal{T}$, $G^{bot}_\mathcal{T}$ contain the first $T$ columns of $G^{top}$, $G^{bot}$, respectively.
+
+Note that $G^{top}\in \mathbb{F}^{T\times P}_q$ is MDS, and hence $G^{top}_\mathcal{T}$ is a $T\times T$ invertible matrix.
+
+Since $Z=(Z_1,\ldots,Z_T)$ chosen uniformly random, so $Z\cdot G^{bot}_\mathcal{T}$ is a one-time pad.
+
+Same proof for decoding, we only need $K+1$ item to make the interpolation work.
+
+</details>
+
+## Conclusion
+
+- Theorem: Lagrange Coded Computing is resilient against $S$ stragglers, $A$ adversaries, and $T$ colluding nodes if 
+  $$
+  P\geq (K+T-1)\deg f+S+2A+1
+  $$
+  - Privacy (increase with $\deg f$) cost more than the straggler and adversary (increase linearly).
+- Caveat: Requires finite field arithmetic!
+- Some follow-up works analyzed information leakage over the reals
+
+## Side note for Blockchain
+
+Blockchain: A decentralized system for trust management.
+
+Blockchain maintains a chain of blocks.
+
+- A block contains a set of transactions.
+- Transaction = value transfer between clients.
+- The chain is replicated on each node.
+
+Periodically, a new block is proposed and appended to each local chain.
+
+- The block must not contain invalid transactions.
+- Nodes must agree on proposed block.
+
+Existing systems:
+
+- All nodes perform the same set of tasks.
+- Every node must receive every block.
+
+Performance does not scale with number of node
+
+### Improving performance of blockchain
+
+The performance of blockchain is inherently limited by its design.
+
+- All nodes perform the same set of tasks.
+- Every node must receive every block.
+
+Idea: Combine blockchain with distributed computing.
+
+- Node tasks should complement each other.
+
+Sharding (notion from databases):
+
+- Nodes are partitioned into groups of equal size.
+- Each group maintains a local chain.
+- More nodes, more groups, more transactions can be processed.
+- Better performance.
+
+### Security Problem
+
+Biggest problem in blockchains: Adversarial (Byzantine) nodes.
+
+- Malicious actors wish to include invalid transactions.
+
+Solution in traditional blockchains: Consensus mechanisms.
+
+- Algorithms for decentralized agreement.
+- Tolerates up to $1/3$ Byzantine nodes.
+
+Problem: Consensus conflicts with sharding.
+
+- Traditional consensus mechanisms tolerate $\approx 1/3$ Byzantine nodes.
+- If we partition $P$ nodes into $K$ groups, we can tolerate only $P/3K$ node failures.
+  - Down from $P/3$ in non-shared systems.
+
+Goal: Solve the consensus problem in sharded systems.
+
+Tool: Coded computing.
+
+### Problem formulation
+
+At epoch $t$ of a shared blockchain system, we have
+
+- $K$ local chain $Y_1^{t-1},\ldots, Y_K^{t-1}$.
+- $K$ new blocks $X_1(t),\ldots,X_K(t)$.
+- A polynomial verification function $f(X_k(t),Y_k^t)$, which validates $X_k(t)$.
+
+<details>
+<summary>Proof</summary>
+
+Balance check function $f(X_k(t),Y_k^t)=\sum_\tau Y_k(\tau)-X_k(t)$.
+
+More commonly, a (polynomial) hash function. Used to:
+
+- Verify the sender's public key.
+- Verify the ownership of the transferred funds.
+
+</details>
+
+Need: Apply a polynomial functions on $K$ datasets.
+
+Lagrange coded computing!
+
+### Blockchain with Lagrange coded computing
+
+At epoch $t$:
+
+- A leader is elected (using secure election mechanism).
+- The leader receives new blocks $X_1(t),\ldots,X_K(t)$.
+- The leader disperses the encoded blocks $\tilde{X}_1(t),\ldots,\tilde{X}_P(t)$ to nodes.
+  - Needs secure information dispersal mechanisms.
+
+Every node $i\in [P]$:
+
+- Locally stores a coded chain $\tilde{Y}_i^t$ (encoded using LCC).
+- Receives $\tilde{X}_i(t)$.
+- Computes $f(\tilde{X}_i(t),\tilde{Y}_i^t)$ and sends to the leader.
+
+The leader decodes to get $\{f(X_i(t),Y_i^t)\}_{i=1}^K$ and disperse securely to nodes.
+
+Node $i$ appends coded block $\tilde{X}_i(t)$ to coded chain $\tilde{Y}_i^t$ (zeroing invalid transactions).
+
+Guarantees security if $P\geq (K+T-1)\deg f+S+2A+1$.
+
+- $A$ adversaries, degree $d$ verification polynomial.
+
+Sharding without sharding:
+
+- Computations are done on (coded) partial chains/blocks.
+  - Good performance!
+- Since blocks/chains are coded, they are "dispersed" among many nodes.
+  - Security problem in sharding solved!
+- Since the encoding is done (securely) through a leader, no need to send every block to all nodes.
+  - Reduced communication! (main bottleneck).
+
+Novelties:
+
+- First decentralized verification system with less than size of blocks times the number of nodes communication.
+- Coded consensus – Reach consensus on coded data.
--- a/content/CSE5313/_meta.js
+++ b/content/CSE5313/_meta.js
@@ -28,4 +28,5 @@ export default {
    CSE5313_L22: "CSE5313 Coding and information theory for data science (Lecture 22)",
    CSE5313_L23: "CSE5313 Coding and information theory for data science (Lecture 23)",
    CSE5313_L24: "CSE5313 Coding and information theory for data science (Lecture 24)",
+    CSE5313_L25: "CSE5313 Coding and information theory for data science (Lecture 25)",
 }