From 7137d8aca232c14470018a4258a7bfae2c518f32 Mon Sep 17 00:00:00 2001 From: Trance-0 <60459821+Trance-0@users.noreply.github.com> Date: Thu, 28 Aug 2025 12:51:46 -0500 Subject: [PATCH] update --- content/CSE510/CSE510_L2.md | 187 ++++++++++++++++++ content/CSE510/_meta.js | 1 + content/CSE5313/CSE5313_L2.md | 348 ++++++++++++++++++++++++++++++++++ content/CSE5313/_meta.js | 1 + content/CSE5519/CSE5519_L2.md | 0 content/CSE5519/_meta.js | 1 + 6 files changed, 538 insertions(+) create mode 100644 content/CSE510/CSE510_L2.md create mode 100644 content/CSE5313/CSE5313_L2.md create mode 100644 content/CSE5519/CSE5519_L2.md diff --git a/content/CSE510/CSE510_L2.md b/content/CSE510/CSE510_L2.md new file mode 100644 index 0000000..499e8d0 --- /dev/null +++ b/content/CSE510/CSE510_L2.md @@ -0,0 +1,187 @@ +# CSE510 Deep Reinforcement Learning (Lecture 2) + +Introduction and Markov Decision Processes (MDPs) + +## What is reinforcement learning (RL) + +- A general computational framework for behavior learning through reinforcement/trial and error +- Deep RL: combining deep learning with RL for complex problems +- Showing a promise for artificial general intelligence (AGI) + +## What RL can do now. + +### Backgammon + +#### Neuro-Gammon + +Developed by Gerald Tesauro in 1989 in IBM's research center. + +Train to mimic expert demonstrations using supervised learning. + +Achieved intermediate-level human player. + +#### TD-Gammon (Temporal Difference Learning) + +Developed by Gerald Tesauro in 1992 in IBM's research center. + +A neural network that trains itself to be an evaluation function by playing against itself starting from random weights. + +Achieved performance close to top human players of its time. + +### DeepMind Atari + +Use deep Q-learning to play Atari games. + +Without human demonstrations, it can learn to play the game at a superhuman level. + +### AlphaGo + +Monte Carlo Tree Search, learning policy and value function networks for pruning the search tree, expert demonstrations, self-play, and TPU from Google. + +### Video Games + +OpenAI Five for Dota 2 + +won 5v5 best of 3 games against top human players. + +Deepmind AlphaStar for StarCraft + +supervised training followed by a league competition training. + +### AlphaTensor + +discovering faster matrix multiplication algorithms with reinforcement learning. + +AlphaTensor: 76 vs Strassen's 80 for 5x5 matrix multiplication. + +### Training LLMs + +For verifiable tasks (coding, math, etc.), RL can be used to train a model to perform the task without human supervision. + +### Robotics + +Unitree Go, Altlas by Boston Dynamics, etc. + +## What are the challenges of RL in real world applications? + +Beating the human champion is "easier" than placing the go stones. + +### State estimation + +Known environments (known entities and dynamics) vs. unknown environments (unknown entities and dynamics). + +Need for behaviors to **transfer/generalize** across environmental variations since the real world is very diverse. + +> **State estimation** +> +> To be able to act, you need first to be able to **see**, detect the **objects** that you interact with, detect whether you achieved the **goal**. + +Most works are between two extremes: + +- Assuming the world model known (object locations, shapes, physical properties obtain via AR tags or manual tuning), they use planners to search for the action sequence to achieve a desired goal. + +- Do not attempt to detect any objects and learn to map RGB images directly to actions. + +Behavior learning is challenging because state estimation is challenging, in other word, because computer vision/perception is challenging. + +Interesting direction: **leveraging DRL and vision-language models** + +### Efficiency + +Cheap vs. Expensive to get experience samples + +#### DRL Sample Efficiency + +Humans after 15 minutes tend to outperform DDQN after +115 hours + +#### Reinforcement Learning in Human + +Human appear to learn to act (e.g., walk) through "very few examples" of trial and error. How is an open question... + +Possible answers: + +- Hardware: 230 million years of bipedal movement data +- Imitation Learning: Observation of other humans walking (e.g., imitation learning, episodic memory and semantic memory) +- Algorithms: Better than backpropagation and stochastic gradient descent + +#### Discrete and continuous action spaces + +Computation is discrete, but the real action space is continuous. + +#### One-goal vs. Multi-goal + +Life is a multi-goal problem. Involving infinitely many possible games. + +#### Rewards automatic and auto detect rewards + +Our curiosity is a reward. + +#### And more + +- Transfer learning +- Generalization +- Long horizon reasoning +- Model-based RL +- Sparse rewards +- Reward design/learning +- Planning/Learning +- Lifelong learning +- Safety +- Interpretability +- etc. + +## What is the course about? + +To teach you RL models and algorithms. + +- To be able to tackle real world problems. + +To excite you about RL. + +- To provide a primer for you to launch advanced studies. + +Schedule: + +- RL Model and basic algorithms + - Markov Decision Process (MDP) + - Passive RL: ADP and TD-learning + - Active RL: Q-Learning and SARSA +- Deep RL algorithms + - Value-Based methods + - Policy Gradient Methods + - Model-Based methods +- Advanced Topics + - Offline RL, Multi-Agent RL, etc. + +### Reinforcement Learning Algorithms + +#### Model-Based + +- Learn the model of the world, then plan using the model +- Update model often +- Re-plan often + +#### Value-Based + +- Learn the state or state-action value +- Act by choosing best action in state +- Exploration is a necessary add-on + +#### Policy-based + +- Learn the stochastic policy function that maps state to action +- Act by sampling policy +- Exploration is baked in + +#### Better sample efficiency to Less sample efficiency + +- Model-Based +- Off-policy/Q-learning +- Actor-critic +- On-policy/Policy gradient +- Evolutionary/Gradient-free + +## What is RL? + +## RL model: Markov Decision Process (MDP) \ No newline at end of file diff --git a/content/CSE510/_meta.js b/content/CSE510/_meta.js index 0006ae3..6ca2d87 100644 --- a/content/CSE510/_meta.js +++ b/content/CSE510/_meta.js @@ -4,4 +4,5 @@ export default { type: 'separator' }, CSE510_L1: "CSE510 Deep Reinforcement Learning (Lecture 1)", + CSE510_L2: "CSE510 Deep Reinforcement Learning (Lecture 2)", } \ No newline at end of file diff --git a/content/CSE5313/CSE5313_L2.md b/content/CSE5313/CSE5313_L2.md new file mode 100644 index 0000000..ec8d7c0 --- /dev/null +++ b/content/CSE5313/CSE5313_L2.md @@ -0,0 +1,348 @@ +# CSE5313 Coding and information theory for data science (Lecture 2) + +## Review on Channel coding + +Let $F$ be the input alphabet, $\Phi$ be the output alphabet. + +e.g. $F=\{0,1\},\mathbb{R}$. + +Introduce noise: $\operatorname{Pr}(c'\text{ received}|c\text{ transmitted})$. + +We use $u$ to denote the information to be transmitted + +$c$ to be the codeword. + +$c'$ is the received codeword. given to the decoder. + +$u'$ is the decoded information word. + +Error if $u' \neq u$. + +Example: + +**Binary symmetric channel (BSC)** + +$F=\Phi=\{0,1\}$ + +Every bit of $c$ is flipped with probability $p$. + +**Binary erasure channel (BEC)** + +$F=\Phi=\{0,1,*\}$, very common in practice when we are unsure when the bit is transmitted. + +$c$ is transmitted, $c'$ is received. + +$c'$ is $c$ with probability $1-p$, $e$ with probability $p$. + +## Encoding + +Encoding $E$ is a function from $F^k$ to $F^n$. + +Where $E(u)=c$ is the codeword. + +Assume $n\geq k$, we don't compress the information. + +A code $\mathcal{C}$ is a subset of $F^n$. + +Encoding is a one to one mapping from $F^k$ to $\mathcal{C}$. + +In practice, we usually choose $\mathcal{C}\subseteq F^n$ to be the size of $F^k$. + +## Decoding + +$D$ is a function from $\Phi^n$ to $\mathcal{C}$. + +$D(c')=\hat{c}$ + +The decoder then outputs the unique $u'$ such that $E(u')=\hat{c}$. + +Our aim is to have $u=u'$. + +Decoding error probability: $\operatorname{P}_{err}=\max_{c\in \mathcal{C}}\operatorname{P}_{err}(c)$. + +where $\operatorname{P}_{err}(c)=\sum_{y|D(y)\neq c}\operatorname{Pr}(y\text{ received}|c\text{ transmitted})$. + +Our goal is to construct decoder $D$ such that $\operatorname{P}_{err}$ is bounded. + +Example: + +Repetition code in binary symmetric channel: + +Let $F=\Phi=\{0,1\}$. Every bit of $c$ is flipped with probability $p$. + +Say $k=1$, $n=3$ and let $\mathcal{C}=\{000,111\}$. + +Let the encoder be $E(u)=u u u$. + +The decoder is $D(000)=D(100)=D(010)=D(001)=0$, $D(110)=D(101)=D(011)=D(111)=1$. + +Exercise: Compute the error probability of the repetition code in binary symmetric channel. + +
+Solution + +Recall that $P_{err}(c)=\sum_{y|D(y)\neq c}\operatorname{Pr}(y\text{ received}|c\text{ transmitted})$. + +Use binomial random variable: + +$$ +\begin{aligned} +P_{err}(000)&=\sum_{y|D(y)\neq 000}\operatorname{Pr}(y\text{ received}|000\text{ transmitted})\\ +&=\operatorname{Pr}(2\text{ flipes or more})\\ +&=\binom{n}{2}p^2(1-p)+\binom{n}{3}p^3\\ +&=3p^2(1-p)+p^3\\ +\end{aligned} +$$ + +The computation is identical for $111$. + +$P_{err}=\max\{P_{err}(000),P_{err}(111)\}=P_{err}(000)=3p^2(1-p)+p^3$. + +
+ +### Maximum likelihood principle + +For $p\leq 1/2$, the last example is maximum likelihood decoder. + +Notice that $\operatorname{Pr}(c'=000|c=000)=(1-p)^3$ and $\operatorname{Pr}(c'=000|c=111)=p^3$. + +- If $p\leq 1/2$, then $(1-p)^3\geq p^3$. $c=000$ is more likely to be transmitted than $c=111$. + +When $\operatorname{Pr}(c'=001|c=000)=(1-p)^2p$ and $\operatorname{Pr}(c'=001|c=111)=p^2(1-p)$. + +- If $p\leq 1/2$, then $(1-p)^2p\geq p^2(1-p)$. $c=001$ is more likely to be transmitted than $c=110$. + +For $p>1/2$, we just negate the above. + +In general, Maximum likelihood decoder is $D(c')=\arg\max_{c\in \mathcal{C}}\operatorname{Pr}(c'\text{ received}|c\text{ transmitted})$. + +## Defining a "good" code + +Two metrics: + +- How many redundant bits are needed? + - e.g. repetition code: $k=1$, $n=3$ sends $2$ redundant bits. +- What is the resulting error probability? + - Depends on the decoding function. + - Normally, maximum likelihood decoding is assumed. + - Should go zero with $n$. + +### Definition for rate of code is $\frac{k}{n}$. + +More generally, $\log_{|F|}\frac{|\mathcal{C}|}{n}$. + +### Definition for information entropy + +Let $X$ be a random variable over a discrete set $\mathcal{X}$. + +- That is every $x\in \mathcal{X}$ has a probability $\operatorname{Pr}(X=x)$. + +The entropy $H(X)$ of a discrete random variable $X$ is defined as: + +$$ +H(X)=\mathbb{E}_{x\sim X}{\log \frac{1}{\operatorname{Pr}(x)}}=-\sum_{x\in \mathcal{X}}\operatorname{Pr}(x)\log \operatorname{Pr}(x) +$$ + +when $X=Bernouili(p)$, we denote $H(X)=H(p)=-p\log p-(1-p)\log (1-p)$. + +A deeper explanation will be given in the later in the course. + +## Which rate are possible? + +Claude Shannon '48: Coding theorem of the BSC(binary symmetric channel) + +Recall $r=\frac{k}{n}$. + +Let $H(\cdot)$ be the entropy function. + +For every $0\leq r<1-H(p)$, + +- There exists $\mathcal{C}_1, \mathcal{C}_2,\ldots$ of rates $r_1,r_2,\ldots$ lengths $n_1,n_2,\ldots$ and $r_i\geq r$. +- That with Maximum likelihood decoding satisifies $P_{err}\to 0$ as $i\to \infty$. + +For any $R\geq 1-H(p)$, + +- Any sequence $\mathcal{C}_1, \mathcal{C}_2,\ldots$ of rates $r_1,r_2,\ldots$ lengths $n_1,n_2,\ldots$ and $r_i\geq R$, +- Any andy decoding algorithm, $P_{err}\to 1$ as $i\to \infty$. + +$1-H(p)$ is the capacity of the BSC. + +- Informally, the capacity is the best possible rate of the code (asymptotically). +- A special case of a broader theorem (Shannon's coding theorem). +- We will see later in this course. + +Polar codes, for explicit construction of codes with rate arbitrarily close to capacity. + +### BSC capacity - Intuition + +Capacity of the binary symmetric channel with crossover probability $p=1-H(p)$. + +A correct decoder $c'\to c$ essentially identifies two objects: + +- The codeword $c$ +- The error word $e=c'-c$ subtraction $\mod 2$. +- $c$ and $e$ are independent of each other. + +A **typical** $e$ has $\approx np$ $1$'s (law of large numbers), say $n(p\pm \delta)$. + +Exercise: + +$\operatorname{Pr}(e)=p^{n(p\pm \delta)}(1-p)^{n(1-p\pm \delta)}=2^{-n(H(p)+\epsilon)}$ for some $\epsilon$ goes to zero as $n\to \infty$. + +
+Intuition + +There exists $\approx 2^{n(H(p)}$ typical error words. + +To index those typical error words, we need $\log_2 (2^{nH(p)})=nH(p)+O(1)$. bits to identify the error word $e$. + +To encode the message, we need $\log_2 |\mathcal{C}|$ bits. + +Since we send $n$ bits, the rate is $k+nH(p)+O(1)\leq n$, so $\frac{k}{n}\leq 1-H(p)$. + +So the rate cannot exceed $1-H(p)$. + +
+ +
+Formal proof + +$$ +\begin{aligned} +\operatorname{Pr}(e)&=p^{n(p\pm \delta)}(1-p)^{n(1-p\pm \delta)}\\ +&=p^{np}(1-p)^{n(1-p)}p^{\pm n\delta}(1-p)^{\mp n\delta}\\ +\end{aligned} +$$ + +And + +$$ +\begin{aligned} +2^{-n(H(p)+\epsilon)}&=2^{-n(-p\log p-(1-p)\log (1-p)+\epsilon)}\\ +&=2^{np\log p}2^{n(1-p)\log (1-p)}2^{-n\epsilon}\\ +&=p^{np}(1-p)^{n(1-p)}2^{-n\epsilon}\\ +\end{aligned} +$$ + +So we need to check there exists $\epsilon>0$ such that + +$$ +\lim_{n\to \infty}p^{\pm n\delta}(1-p)^{\mp n\delta}\leq 2^{-n\epsilon} +$$ + +Test + +$$ +\begin{aligned} +2^{-n\epsilon}&=p^{np}(1-p)^{n(1-p)}2^{-n\epsilon}\\ +-n\epsilon&=\delta n\log p-\delta n\log (1-p)\\ +\epsilon&=\delta (\log (1-p)-\log p)\\ +\end{aligned} +$$ + + +
+ +## Hamming distance + +How to quantify the noise in the channel? + +- Number of flipped bits. + +Definition of Hamming distance: + +- Denote $c=(c_1,c_2,\ldots,c_n)$ and $c'=(c'_1,c'_2,\ldots,c'_n)$. +- $d_H(c,c')=\sum_{i=1}^n [c_i\neq c'_i]$. + +Minimum hamminng distance: + +- Let $\mathcal{C}$ be a code. +- $d_H(\mathcal{C})=\min_{c_1,c_2\in \mathcal{C},c_1\neq c_2}d_H(c_1,c_2)$. + +Hamming distance is a metric. + +- $d_H(x,y)\geq 0$ equal iff $x=y$. +- $d_H(x,y)=d_H(y,x)$ +- Triangle inequality: $d_H(x,y)\leq d_H(x,z)+d_H(z,y)$ + +### Level of error handling + +error detection + +erasure correction + +error correction + +Erasure: replacement of an entry by $*\not\in F$. + +Error: substitution of one entry by a different one. + +Example: If $d_H(\mathcal{C})=d$. + +#### Error detection + +Theorem: If $d_H(\mathcal{C})=d$, then there exists $f:F^n\to \mathcal{C}\cap \{\text{"error detected"}\}$. that detects every patter of $\leq d-1$ errors correctly. + +\* track lost *\ + +Idea: + +Since $d_H(\mathcal{C})=d$, one needs $\geq d$ errors to cause "confusion$. + +#### Erasure correction + +Theorem: If $d_H(\mathcal{C})=d$, then there exists $f:\{F^n\cup \{*\}\}\to \mathcal{C}\cap \{\text{"failed"}\}$. that recovers every patter of at most $d-1$ erasures. + +Idea: + +\* track lost *\ + +#### Error correction + +Define the Hamming ball of radius $r$ centered at $c$ as: + +$$ +B_H(c,r)=\{y\in F^n:d_H(c,y)\leq r\} +$$ + +Theorem: If $d_H(\mathcal{C})\geq d$, then there exists $f:F^n\to \mathcal{C}$ that corrects every pattern of at most $\lfloor \frac{d-1}{2}\rfloor$ errors. + +Ideas: + +The ball $\{B_H(c,\lfloor \frac{d-1}{2}\rfloor)|c\in \mathcal{C}\}$ are disjoint. + +Use closest neighbor decoding, use triangle inequality. + +## Intro to linear codes + +Summary: a code of minimum hamming distance $d$ can + +- detect $\leq d-1$ errors. +- correct $\leq d-1$ erasures. +- Correct $\leq \lfloor \frac{d-1}{2}\rfloor$ errors. + +Problems: + +- How to construct good codes, $k/n$ and $d$ large? +- How good can these codes possibly be? +- How to encode? +- How to decode with noisy channel + +Tools + +- Linear algebra over finite fields. + +### Linear codes + +Consider $F^n$ as a vector space, and let $\mathcal{C}\subseteq F^n$ be a subspace. + +$F,\Phi$ are finites, we use finite fields (algebraic objects that "immitate" $\mathbb{R}^n$, $\mathbb{C}^n$). + +Formally, satisfy the field axioms. + +Next Lectures: + +- Field axioms +- Prime fields ($\mathbb{F}_p$) +- Field extensions (e.g. $\mathbb{F}_{p^t}$) + diff --git a/content/CSE5313/_meta.js b/content/CSE5313/_meta.js index 92ba86b..70a7513 100644 --- a/content/CSE5313/_meta.js +++ b/content/CSE5313/_meta.js @@ -4,4 +4,5 @@ export default { type: 'separator' }, CSE5313_L1: "CSE5313 Coding and information theory for data science (Lecture 1)", + CSE5313_L2: "CSE5313 Coding and information theory for data science (Lecture 2)", } \ No newline at end of file diff --git a/content/CSE5519/CSE5519_L2.md b/content/CSE5519/CSE5519_L2.md new file mode 100644 index 0000000..e69de29 diff --git a/content/CSE5519/_meta.js b/content/CSE5519/_meta.js index 8cdbe3e..2a4e437 100644 --- a/content/CSE5519/_meta.js +++ b/content/CSE5519/_meta.js @@ -4,4 +4,5 @@ export default { type: 'separator' }, CSE5519_L1: "CSE5519 Advances in Computer Vision (Lecture 1)", + CSE5519_L2: "CSE5519 Advances in Computer Vision (Lecture 2)", } \ No newline at end of file