diff --git a/content/CSE510/CSE510_L21.md b/content/CSE510/CSE510_L21.md new file mode 100644 index 0000000..1c6524e --- /dev/null +++ b/content/CSE510/CSE510_L21.md @@ -0,0 +1,146 @@ +# CSE510 Deep Reinforcement Learning (Lecture 21) + +## Exploration in RL + +### Information state search + +Uncertainty about state transitions or dynamics + +Dynamics prediction error or Information gain for dynamics learning + +#### Computational Curiosity + +- "The direct goal of curiosity and boredom is to improve the world model." +- "Curiosity Unit": reward is a function of the mismatch between model's current predictions and actuality. +- There is positive reinforcement whenever the system fails to correctly predict the environment. +- Thus the usual credit assignment process ... encourages certain past actions in order to repeat situations similar to the mismatch situation. (planning to make your (internal) world model to fail) + +#### Reward Prediction Error + +- Add exploration reward bonuses that encourage policies to visit states that will cause the prediction model to fail. + +$$ +R(s,a,s') = r(s,a,s') + \mathcal{B}(\|T(s,a,\theta)-s'\|) +$$ + +- where $r(s,a,s')$ is the extrinsic reward, $T(s,a,\theta)$ is the predicted next state, and $\mathcal{B}$ is a bonus function (intrinsic reward bonus). +- Exploration reward bonuses are non-stationary: as the agent interacts with the environment, what is now new and novel, becomes old and known. + +[link to the paper](https://arxiv.org/pdf/1507.08750) + + +
+Example + +Learning Visual Dynamics + +- Exploration reward bonuses $\mathcal{B}(s, a, s') = \|T(s, a; \theta) - s'\|$ + - However, trivial solution exists: could get reward by just moving around randomly. + +--- + +- Exploration reward bonuses with autoencoders $\mathcal{B}(s, a, s') = \|T(E(s';\theta),a;\theta)-E(s';\theta)\|$ + - But suffer the problems of autoencoding reconstruction loss that has little to do with our task + +#### Task Rewards vs. Exploration Rewards + +Exploration rewards bonuses: + +$$ +\mathcal{B}(s, a, s') = \|T(E(s';\theta),a;\theta)-E(s';\theta)\| +$$ + +Only task rewards: + +$$ +R(s,a,s') = r(s,a,s') +$$ + +Task+curiosity rewards: + +$$ +R^t(s,a,s') = r(s,a,s') + \mathcal{B}^t(s, a, s') +$$ + +Sparse task + curiosity rewards: + +$$ +R^t(s,a,s') = r^t(s,a,s') + \mathcal{B}^t(s, a, s') +$$ + +Only curiosity rewards: + +$$ +R^c(s,a,s') = \mathcal{B}^c(s, a, s') +$$ + +#### Extrinsic reward RL is not New + +- Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: NIPS’05. pp. 547–554 (2006) +- Schmidhuber, J.: Curious model-building control systems. In: IJCNN’91. vol. 2, pp. 1458–1463 (1991) +- Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990-2010). Autonomous Mental Development, IEEE Trans. on Autonomous Mental Development 2(3), 230–247 (9 2010) +- Singh, S., Barto, A., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS’04 (2004) +- Storck, J., Hochreiter, S., Schmidhuber, J.: Reinforcement driven information acquisition in non-deterministic environments. In: ICANN’95 (1995) +- Sun, Y., Gomez, F.J., Schmidhuber, J.: Planning to be surprised: Optimal Bayesian exploration in dynamic environments (2011), http://arxiv.org/abs/1103.5708 + +#### Limitation of Prediction Errors + +- Agent will be rewarded even though the model cannot improve. +- So it will focus on parts of environment that are inherently unpredictable or stochastic. +- Example: the noisy-TV problem + - The agent is attracted forever in the most noisy states, with unpredictable outcomes. + +#### Random Network Distillation + +Original idea: Predicting the output of a fixed and randomly initialized neural network on the next state, given the current state and action. + +New idea: Predicting the output of a fixed and randomly initialized neural network on the next state, given the **next state itself.** + +- The target network is a neural network with fixed, randomized weights, which is never trained. +- The prediction network is trained to predict the target network's output. + +> the more you visit the state, the less loss you will have. + +### Posterior Sampling + +Uncertainty about Q-value functions or policies + +Selecting actions according to the probability they are best according to the current model. + +#### Exploration with Action Value Information + +Count-Based and Curiosity-driven method does not take into +account the action value information + +![Action Value Information](https://notenextra.trance-0.com/CSE510/Action_Value_Information.png) + +> In this case, the optimal solution is action 1, but we will explore action 3 because it has the highest uncertainty. And it takes long to distinguish action 1 and 2 since they have similar values. + +#### Exploration via Posterior Sampling of Q Functions + +- Represent a posterior distribution of Q functions, instead of a point estimate. + 1. Sample from $P(Q), Q\sim P(Q)$ + 2. Choose actions according to this $Q$ for one episode $a=\arg\max_{a} Q(s,a)$ + 3. Update $P(Q)$ based on the sampled $Q$ and collected experience tuples $(s,a,r,s')$ +- Then we do not need $\epsilon$-greedy for exploration! Better exploration by representing uncertainty over Q. +- But how can we learn a distribution of Q functions $P(Q)$ if Q function is a deep neural network? + +#### Bootstrap Ensemble + +- Neural network ensembles: train multiple Q-function approximations each on using different subset of the data + - Computationally expensive +- Neural network ensembles with shared backbone: only the heads are trained with different subset of the data + +### Questions + +- Why do PG methods implicitly support exploration? +- Is it sufficient? How can we improve its implicit exploration? +- What are limitations of entropy regularization? +- How can we improve exploration for PG methods? + - Intrinsic-motivated bonuses (e.g., RND) + - Explicitly optimize per-state entropy in the return (e.g., SAC) + - Hierarchical RL + - Goal-conditional RL +- What are potentially more effective exploration methods? + - Knowledge-driven + - Model-based exploration \ No newline at end of file diff --git a/content/CSE510/_meta.js b/content/CSE510/_meta.js index 43bbfc8..780b495 100644 --- a/content/CSE510/_meta.js +++ b/content/CSE510/_meta.js @@ -23,4 +23,5 @@ export default { CSE510_L18: "CSE510 Deep Reinforcement Learning (Lecture 18)", CSE510_L19: "CSE510 Deep Reinforcement Learning (Lecture 19)", CSE510_L20: "CSE510 Deep Reinforcement Learning (Lecture 20)", + CSE510_L21: "CSE510 Deep Reinforcement Learning (Lecture 21)", } \ No newline at end of file diff --git a/content/CSE5313/CSE5313_L19.md b/content/CSE5313/CSE5313_L19.md index 98ba394..2764b2a 100644 --- a/content/CSE5313/CSE5313_L19.md +++ b/content/CSE5313/CSE5313_L19.md @@ -230,3 +230,4 @@ $$ > [!TIP] > > error + known location $\implies$ erasure. $d = 2 \implies$ 1 erasure is correctable. + diff --git a/content/CSE5313/CSE5313_L20.md b/content/CSE5313/CSE5313_L20.md new file mode 100644 index 0000000..37e94c8 --- /dev/null +++ b/content/CSE5313/CSE5313_L20.md @@ -0,0 +1,252 @@ +# CSE5313 Coding and information theory for data science (Lecture 20) + +## Review for Private Information Retrieval + +### PIR from replicated databases + +For 2 replicated databases, we have the following protocol: + +- User has $i \sim U_{m}$. +- User chooses $r_1, r_2 \sim U_{\mathbb{F}_2^m}$. +- Two queries to each server: + - $q_{1, 1} = r_1 + e_i$, $q_{1, 2} = r_2$. + - $q_{2, 1} = r_1$, $q_{2, 2} = r_2 + e_i$. +- Server $j$ responds with $q_{j, 1} c_j^\top$ and $q_{j, 2} c_j^\top$. +- Decoding? + - $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top = r_1 c_1 + c_2 + e_i c_1^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$. + - $q_{1, 2} c_1^\top + q_{2, 2} c_2^\top = r_2 c_1 + c_2 + e_i c_2^\top = x_{i, 2}$. + +PIR-rate is $\frac{k}{2k} = \frac{1}{2}$. + +### PIR from coded parity-check databases + +For 3 coded parity-check databases, we have the following protocol: + +- User has $i \sim U_{m}$. +- User chooses $r_1, r_2, r_3 \sim U_{\mathbb{F}_2^m}$. +- Three queries to each server: + - $q_{1, 1} = r_1 + e_i$, $q_{1, 2} = r_2$, $q_{1, 3} = r_3$. + - $q_{2, 1} = r_1$, $q_{2, 2} = r_2 + e_i$, $q_{2, 3} = r_3$. + - $q_{3, 1} = r_1$, $q_{3, 2} = r_2$, $q_{3, 3} = r_3 + e_i$. +- Server $j$ responds with $q_{j, 1} c_j^\top, q_{j, 2} c_j^\top, q_{j, 3} c_j^\top$. +- Decoding? + - $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top + q_{3, 1} c_3^\top = r_1 c_1 + c_2 + c_3 + e_i c_1^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$. + - $q_{1, 2} c_1^\top + q_{2, 2} c_2^\top + q_{3, 2} c_3^\top = r_2 c_1 + c_2 + c_3 + e_i c_2^\top = x_{i, 2}$. + - $q_{1, 3} c_1^\top + q_{2, 3} c_2^\top + q_{3, 3} c_3^\top = r_3 c_1 + c_2 + c_3 + e_i c_3^\top = x_{i, 3}$. + +PIR-rate is $\frac{k}{3k} = \frac{1}{3}$. + +## Beyond z=1 + +### Star-product theme + +Given $x=(x_1, \ldots, x_j)_{j\in [n]}, y=(y_1, \ldots, y_j)_{j\in [n]}$, over $\mathbb{F}_q$, the star-product is defined as: + +$$ +x \star y = (x_1 y_1, \ldots, x_n y_n) +$$ + +Given two linear codes, $C,D\subseteq \mathbb{F}_q^n$, the star-product code is defined as: + +$$ +C \star D = span_{\mathbb{F}_q} \{x \star y | x \in C, y \in D\} +$$ + +Singleton bound for star-product: + +$$ +d_{C \star D} \leq n-\dim C-\dim D+2 +$$ + +### PIR form a database coded with any MDS code and z>1 + +To generalize the previous scheme to $z > 1$ need to encode multiple $r$'s together. + +- As in the ramp scheme. + +> Recall from the ramp scheme, we use $r_1, \ldots, r_z \sim U_{\mathbb{F}_q^k}$ as our key vector to avoid occlusion of the servers. + +In the star-product scheme: + +- Files are coded with an MDS code $C$. +- The multiple $r$'s are coded with an MDS code $D$. +- The scheme is based on the minimum distance of $C \star D$. + +To code the data: + +- Let $C \subseteq \mathbb{F}_q^n$ be an MDS code of dimension $k$. +- For all $j \in m$, encode file $x_j = x_{j, 1}, \ldots, x_{j, k}$ using $G_C$: + +$$ +\begin{pmatrix} +x_{1, 1} & x_{1, 2} & \cdots & x_{1, k}\\ +x_{2, 1} & x_{2, 2} & \cdots & x_{2, k}\\ +\vdots & \vdots & \ddots & \vdots\\ +x_{m, 1} & x_{m, 2} & \cdots & x_{m, k} +\end{pmatrix} \cdot G_C = \begin{pmatrix} +c_{1, 1} & c_{1, 2} & \cdots & c_{1, n}\\ +c_{2, 1} & c_{2, 2} & \cdots & c_{2, n}\\ +\vdots & \vdots & \ddots & \vdots\\ +c_{m, 1} & c_{m, 2} & \cdots & c_{m, n} +\end{pmatrix} +$$ + +- For all $j \in n$, store $c_j = c_{1, j}, c_{2, j}, \ldots, c_{m, j}$ (a column of the above matrix) in server $j$. + +Let $r_1, \ldots, r_z \sim U_{\mathbb{F}_q^k}$. + +To code the queries: + +- Let $D \subseteq \mathbb{F}_q^k$ be an MDS code of dimension $z$. +- Encode the $r_j$'s using $G_D=[g_1^\top, \ldots, g_z^\top]$. + +$$ +(r_1^\top, \ldots, r_z^\top) \cdot G_D = \begin{pmatrix} +r_{1, 1} & r_{2, 1} & \cdots & r_{z, 1}\\ +r_{1, 2} & r_{2, 2} & \cdots & r_{z, 2}\\ +\vdots & \vdots & \ddots & \vdots\\ +r_{1, m} & r_{2, m} & \cdots & r_{z, m} +\end{pmatrix} +\cdot G_D=\left((r_1^\top,\ldots, r_z^\top)g_1^\top,\ldots, (r_1^\top,\ldots, r_z^\top)g_n^\top \right) +$$ + +To introduce the "errors in known locations" to the encoded $r_j$'s: + +- Let $W \in \{0, 1\}^{m \times n}$ with some $d_{C \star D} - 1$ entries in its $i$-th row equal to 1. +- These are the entries we will retrieve. + +For every server $j \in [n]$ send $q_j = r_1^\top, \ldots, r_z^\top g_j^\top + w_j$, where $w_j$ is the $i$-th column of $W$. + +- This is similar to ramp scheme, where $w_j$ is the "message". +- Privacy against collusion of $z$ servers. + +Response from server: $a_j = q_j c_j^\top$. + +Decoding? Let $Q \in \mathbb{F}_q^{m \times n}$ be a matrix whose columns are the $q_j$'s. +$$ +Q = \begin{pmatrix} +r_1^\top & \cdots & r_z^\top +\end{pmatrix} \cdot G_D + W +$$ + +- The user has + +$$ +\begin{aligned} +q_1 c_1^\top, \ldots, q_n c_n^\top &= \left(\sum_{j \in m} q_{1, j} c_{j, 1}, \ldots, \sum_{j \in m} q_{n, j} c_{j, n}\right) \\ +&=\sum_{j \in m} (q_{1,j}c_{j, 1}, \ldots, q_{n,j}c_{j, n}) \\ +&=\sum_{j \in m} q^j \star c^j +$$ + +where $q^j$ is a row of $Q$ and $c^j$ is a codeword in $C$ (an $n, k$ $q$ MDS code). + +We have: + +- $Q=(r_1^\top, \ldots, r_z^\top) \cdot G_D + W$ +- $W\in \{0, 1\}^{m \times n}$ with some $d_{C \star D} - 1$ entries in its $i$-th row equal to 1. +- $(q^j \star c^j)=sum_{j \in m} q^j \star c^j$ +- Each $q^j$ is a row of $Q$ + - For $j \neq i$, $q^j$ is a codeword in $D$ + - $q^i = d^i + w^i$ +- Therefore: + +$$ +\begin{aligned} +\sum_{j \in [m]} q^j \star c^j &= \sum_{j \neq i} (d^j \star c^j) + ((d^i + w^i) \star c^i) \\ +&= \sum_{j \neq i} (d^j \star c^j) + w^i \star c^i +&= (\text{codeword in } C \star D )+( \text{noise of Hamming weight } \leq d_{C \star D} - 1) +\end{aligned} +$$ + +Multiply by $H_{C \star D}$ and get $d_{C \star D} - 1$ elements of $c^i$. + +- Recall that $c^i = x_i \cdot G_C$ +- Repeat $k^{d_{C \star D} - 1}$ times to obtain $k$ elements of $c^i$. + - Suffices to obtain $x_i$, since $C$ is $n, k$ $q$ MDS code. + +PIR-rate: + +- = $\frac{k}{# \text{ downloaded elements}} = \frac{k}{\frac{k}{d_{C \star D} - 1} \cdot n} = \frac{d_{C \star D} - 1}{n}$ +- Singleton bound for star-product: $d_{C \star D} \leq n - \dim C - \dim D + 2$. +- Achieved with equality if $C$ and $D$ are Reed-Solomon codes. +- PIR-rate = $\frac{n - \dim C - \dim D + 1}{n} = \frac{n - k - z + 1}{n}$. +- Intuition: + - "paying" $k$ for "reconstruction from any $k$". + - "paying" $z$ for "protection against colluding sets of size $z$". +- Capacity unknown! (as of 2022). + - Known for special cases, e.g., $k = 1, z = 1$, certain types of schemes, etc. + +### PIR over graphs + +Graph-based replication: + +- Every file is replicated twice on two separate servers. +- Every two servers have at most one file in common. +- "file" = "granularity" of data, i.e., the smallest information unit shared by any two servers. + +A server that stores $(x_{i, j})_{j=1}^d$ receives $(q_{i, j})_{j=1}^d$, and replies with $\sum_{j=1}^d q_{i, j} \cdot x_{i, j}$. + +The idea: + +- Consider a 2-server replicated PIR and "split" the queries between the servers. +- Sum the responses, unwanted files "cancel out", while $x_i$ does not. + +Problem: Collusion. + +Solution: Add per server randomness. + +Good for any graph, and any $q \geq 3$ (for simplicity assume $2 | q$). + +The protocol: + +- Choose random $\gamma \in \mathbb{F}_q^n$, $\nu \in \mathbb{F}_q^m$, and $h \in \mathbb{F} \setminus \{0, 1\}$. +- Queries: + - If node $j$ is incident with edge $\ell$, send $q_{j, \ell} = \gamma_j \cdot \nu_\ell$ to node $j$. + - I.e., if server $j$ stores file $\ell$. +- Except one node $j_0$ that stores $x_i$, which gets $q_{j_0, i} = h \cdot \gamma_{j_0} \cdot \nu_i$. +- Server $j$ responds with $a_j = \sum_{j=1}^d q_{j, \ell} \cdot x_{i, \ell}$. + - Where $x_{i, 1}, \ldots, $x_{i, d}$ are the files adjacent with it. + +
+Example + +- Consider the following graph. +- $n = 5, m = 7, and i = 3$. +- $q_3 = \gamma_3 \cdot v_2, v_3, v_6$ and $a_3 = x_2 \cdot \gamma_3 v_2 + x_3 \cdot \gamma_3 v_3 + x_6 \cdot \gamma_3 v_6$. +- $q_2 = \gamma_2 \cdot v_1, h v_3, v_4$ and $a_2 = x_1 \cdot \gamma_2 v_1 + x_3 \cdot h \gamma_2 v_3 + x_4 \cdot \gamma_2 v_4$. + +![Example of PIR over graphs](https://notenextra.trance-0.com/CSE5313/PIR_over_graphs.png) + +
+ +Correctness: + +- $\sum_{j=1}^5 \gamma_j^{-1} a_j =( h + 1 )v_3 x_3$ +- $h \neq 1, v_3 \neq 0 \implies$ find $x_3$. + +Parameters: + +- Storage overhead 2 (for any graph). +- Download $n \cdot k$. +- PIR rate 1/n. + +Collusion resistance: + +1-privacy: Each node sees an entirely random vector. + +2-privacy: + +- If no edge – as for 1-privacy. +- If edge exists – E.g., + - $\gamma_3 v_6$ and $\gamma_4 v_6$ are independent. + - $\gamma_3 v_3$ and $h \cdot \gamma_2 v_3$ are independent. + +S-privacy: + +- Let $S \subseteq n$ (e.g., $S = 2,3,5$), and consider the query matrix of their mutual files: + +$$ +Q_S = diag(\gamma_3, \gamma_2, \gamma_5) \begin{pmatrix} 1 &\\ h & 1 \\ & 1\end{pmatrix} diag(v_3, v_4) +$$ + +- It can be shown that $Pr(Q_S)=\frac{1}{(q-1)^4}$, regardless of $i \implies$ perfect privacy. diff --git a/content/CSE5313/_meta.js b/content/CSE5313/_meta.js index dd6b483..bcc1380 100644 --- a/content/CSE5313/_meta.js +++ b/content/CSE5313/_meta.js @@ -22,5 +22,6 @@ export default { CSE5313_L16: "CSE5313 Coding and information theory for data science (Exam Review)", CSE5313_L17: "CSE5313 Coding and information theory for data science (Lecture 17)", CSE5313_L18: "CSE5313 Coding and information theory for data science (Lecture 18)", - CSE5313_L19: "CSE5313 Coding and information theory for data science (Exam Review)", + CSE5313_L19: "CSE5313 Coding and information theory for data science (Lecture 19)", + CSE5313_L20: "CSE5313 Coding and information theory for data science (Lecture 20)", } \ No newline at end of file diff --git a/content/CSE5519/CSE5519_C4.md b/content/CSE5519/CSE5519_C4.md index 6177990..2422611 100644 --- a/content/CSE5519/CSE5519_C4.md +++ b/content/CSE5519/CSE5519_C4.md @@ -1,2 +1,13 @@ # CSE5519 Advances in Computer Vision (Topic C: 2024 - 2025: Neural Rendering) +## COLMAP-Free 3D Gaussian SplattingLinks to an external site + +[link to the paper](https://arxiv.org/pdf/2312.07504) + +We propose a novel 3D Gaussian Splatting (3DGS) framework that eliminates the need for COLMAP for camera pose estimation and bundle adjustment. + +> [!TIP] +> +> This paper presents a novel 3D Gaussian Splatting framework that eliminates the need for COLMAP for camera pose estimation and bundle adjustment. +> +> Inspired by point map construction, the author uses Gaussian splatting to reconstruct the 3D scene. I wonder how this method might contribute to higher resolution reconstruction or improvements. Can we use the original COLMAP on traditional NeRF methods for comparable results? diff --git a/content/CSE5519/CSE5519_F5.md b/content/CSE5519/CSE5519_F5.md index d77dc4d..bcb9191 100644 --- a/content/CSE5519/CSE5519_F5.md +++ b/content/CSE5519/CSE5519_F5.md @@ -1,2 +1,17 @@ # CSE5519 Advances in Computer Vision (Topic F: 2025: Representation Learning) +## Can Generative Models Improve Self-Supervised Representation Learning? + +[link to the paper](https://arxiv.org/pdf/2403.05966) + +### Novelty in SSL with Generative Models + +- Use generative models to generate synthetic data to train self-supervised representation learning models. +- Use generative augmentation to generate new data from the original data using a generative model. (with gaussian noise, or other data augmentation techniques) +- Using standard augmentation techniques like flipping, cropping, and color jittering with generative techniques can further improve the performance of the self-supervised representation learning models. + +> [!TIP] +> +> This paper shows that using generative models to generate synthetic data can improve the performance of self-supervised representation learning models. The key seems to be the use of generative augmentation to generate new data from the original data using a generative model. +> +> However, both representation learning and generative modeling have some hallucinations. I wonder will these kinds of hallucinations would be reinforced, and the bias in the generation model would propagate to the representation learning model in the process of generative augmentation? diff --git a/public/CSE510/Action-value-information.png b/public/CSE510/Action-value-information.png new file mode 100644 index 0000000..d1a07f2 Binary files /dev/null and b/public/CSE510/Action-value-information.png differ