updates today

This commit is contained in:
Trance-0
2025-11-06 13:59:31 -06:00
parent 51b34be077
commit 74364283fe
8 changed files with 428 additions and 1 deletions

View File

@@ -0,0 +1,146 @@
# CSE510 Deep Reinforcement Learning (Lecture 21)
## Exploration in RL
### Information state search
Uncertainty about state transitions or dynamics
Dynamics prediction error or Information gain for dynamics learning
#### Computational Curiosity
- "The direct goal of curiosity and boredom is to improve the world model."
- "Curiosity Unit": reward is a function of the mismatch between model's current predictions and actuality.
- There is positive reinforcement whenever the system fails to correctly predict the environment.
- Thus the usual credit assignment process ... encourages certain past actions in order to repeat situations similar to the mismatch situation. (planning to make your (internal) world model to fail)
#### Reward Prediction Error
- Add exploration reward bonuses that encourage policies to visit states that will cause the prediction model to fail.
$$
R(s,a,s') = r(s,a,s') + \mathcal{B}(\|T(s,a,\theta)-s'\|)
$$
- where $r(s,a,s')$ is the extrinsic reward, $T(s,a,\theta)$ is the predicted next state, and $\mathcal{B}$ is a bonus function (intrinsic reward bonus).
- Exploration reward bonuses are non-stationary: as the agent interacts with the environment, what is now new and novel, becomes old and known.
[link to the paper](https://arxiv.org/pdf/1507.08750)
</details>
<details>
<summary>Example</summary>
Learning Visual Dynamics
- Exploration reward bonuses $\mathcal{B}(s, a, s') = \|T(s, a; \theta) - s'\|$
- However, trivial solution exists: could get reward by just moving around randomly.
---
- Exploration reward bonuses with autoencoders $\mathcal{B}(s, a, s') = \|T(E(s';\theta),a;\theta)-E(s';\theta)\|$
- But suffer the problems of autoencoding reconstruction loss that has little to do with our task
#### Task Rewards vs. Exploration Rewards
Exploration rewards bonuses:
$$
\mathcal{B}(s, a, s') = \|T(E(s';\theta),a;\theta)-E(s';\theta)\|
$$
Only task rewards:
$$
R(s,a,s') = r(s,a,s')
$$
Task+curiosity rewards:
$$
R^t(s,a,s') = r(s,a,s') + \mathcal{B}^t(s, a, s')
$$
Sparse task + curiosity rewards:
$$
R^t(s,a,s') = r^t(s,a,s') + \mathcal{B}^t(s, a, s')
$$
Only curiosity rewards:
$$
R^c(s,a,s') = \mathcal{B}^c(s, a, s')
$$
#### Extrinsic reward RL is not New
- Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: NIPS05. pp. 547554 (2006)
- Schmidhuber, J.: Curious model-building control systems. In: IJCNN91. vol. 2, pp. 14581463 (1991)
- Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990-2010). Autonomous Mental Development, IEEE Trans. on Autonomous Mental Development 2(3), 230247 (9 2010)
- Singh, S., Barto, A., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS04 (2004)
- Storck, J., Hochreiter, S., Schmidhuber, J.: Reinforcement driven information acquisition in non-deterministic environments. In: ICANN95 (1995)
- Sun, Y., Gomez, F.J., Schmidhuber, J.: Planning to be surprised: Optimal Bayesian exploration in dynamic environments (2011), http://arxiv.org/abs/1103.5708
#### Limitation of Prediction Errors
- Agent will be rewarded even though the model cannot improve.
- So it will focus on parts of environment that are inherently unpredictable or stochastic.
- Example: the noisy-TV problem
- The agent is attracted forever in the most noisy states, with unpredictable outcomes.
#### Random Network Distillation
Original idea: Predicting the output of a fixed and randomly initialized neural network on the next state, given the current state and action.
New idea: Predicting the output of a fixed and randomly initialized neural network on the next state, given the **next state itself.**
- The target network is a neural network with fixed, randomized weights, which is never trained.
- The prediction network is trained to predict the target network's output.
> the more you visit the state, the less loss you will have.
### Posterior Sampling
Uncertainty about Q-value functions or policies
Selecting actions according to the probability they are best according to the current model.
#### Exploration with Action Value Information
Count-Based and Curiosity-driven method does not take into
account the action value information
![Action Value Information](https://notenextra.trance-0.com/CSE510/Action_Value_Information.png)
> In this case, the optimal solution is action 1, but we will explore action 3 because it has the highest uncertainty. And it takes long to distinguish action 1 and 2 since they have similar values.
#### Exploration via Posterior Sampling of Q Functions
- Represent a posterior distribution of Q functions, instead of a point estimate.
1. Sample from $P(Q), Q\sim P(Q)$
2. Choose actions according to this $Q$ for one episode $a=\arg\max_{a} Q(s,a)$
3. Update $P(Q)$ based on the sampled $Q$ and collected experience tuples $(s,a,r,s')$
- Then we do not need $\epsilon$-greedy for exploration! Better exploration by representing uncertainty over Q.
- But how can we learn a distribution of Q functions $P(Q)$ if Q function is a deep neural network?
#### Bootstrap Ensemble
- Neural network ensembles: train multiple Q-function approximations each on using different subset of the data
- Computationally expensive
- Neural network ensembles with shared backbone: only the heads are trained with different subset of the data
### Questions
- Why do PG methods implicitly support exploration?
- Is it sufficient? How can we improve its implicit exploration?
- What are limitations of entropy regularization?
- How can we improve exploration for PG methods?
- Intrinsic-motivated bonuses (e.g., RND)
- Explicitly optimize per-state entropy in the return (e.g., SAC)
- Hierarchical RL
- Goal-conditional RL
- What are potentially more effective exploration methods?
- Knowledge-driven
- Model-based exploration

View File

@@ -23,4 +23,5 @@ export default {
CSE510_L18: "CSE510 Deep Reinforcement Learning (Lecture 18)", CSE510_L18: "CSE510 Deep Reinforcement Learning (Lecture 18)",
CSE510_L19: "CSE510 Deep Reinforcement Learning (Lecture 19)", CSE510_L19: "CSE510 Deep Reinforcement Learning (Lecture 19)",
CSE510_L20: "CSE510 Deep Reinforcement Learning (Lecture 20)", CSE510_L20: "CSE510 Deep Reinforcement Learning (Lecture 20)",
CSE510_L21: "CSE510 Deep Reinforcement Learning (Lecture 21)",
} }

View File

@@ -230,3 +230,4 @@ $$
> [!TIP] > [!TIP]
> >
> error + known location $\implies$ erasure. $d = 2 \implies$ 1 erasure is correctable. > error + known location $\implies$ erasure. $d = 2 \implies$ 1 erasure is correctable.

View File

@@ -0,0 +1,252 @@
# CSE5313 Coding and information theory for data science (Lecture 20)
## Review for Private Information Retrieval
### PIR from replicated databases
For 2 replicated databases, we have the following protocol:
- User has $i \sim U_{m}$.
- User chooses $r_1, r_2 \sim U_{\mathbb{F}_2^m}$.
- Two queries to each server:
- $q_{1, 1} = r_1 + e_i$, $q_{1, 2} = r_2$.
- $q_{2, 1} = r_1$, $q_{2, 2} = r_2 + e_i$.
- Server $j$ responds with $q_{j, 1} c_j^\top$ and $q_{j, 2} c_j^\top$.
- Decoding?
- $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top = r_1 c_1 + c_2 + e_i c_1^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$.
- $q_{1, 2} c_1^\top + q_{2, 2} c_2^\top = r_2 c_1 + c_2 + e_i c_2^\top = x_{i, 2}$.
PIR-rate is $\frac{k}{2k} = \frac{1}{2}$.
### PIR from coded parity-check databases
For 3 coded parity-check databases, we have the following protocol:
- User has $i \sim U_{m}$.
- User chooses $r_1, r_2, r_3 \sim U_{\mathbb{F}_2^m}$.
- Three queries to each server:
- $q_{1, 1} = r_1 + e_i$, $q_{1, 2} = r_2$, $q_{1, 3} = r_3$.
- $q_{2, 1} = r_1$, $q_{2, 2} = r_2 + e_i$, $q_{2, 3} = r_3$.
- $q_{3, 1} = r_1$, $q_{3, 2} = r_2$, $q_{3, 3} = r_3 + e_i$.
- Server $j$ responds with $q_{j, 1} c_j^\top, q_{j, 2} c_j^\top, q_{j, 3} c_j^\top$.
- Decoding?
- $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top + q_{3, 1} c_3^\top = r_1 c_1 + c_2 + c_3 + e_i c_1^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$.
- $q_{1, 2} c_1^\top + q_{2, 2} c_2^\top + q_{3, 2} c_3^\top = r_2 c_1 + c_2 + c_3 + e_i c_2^\top = x_{i, 2}$.
- $q_{1, 3} c_1^\top + q_{2, 3} c_2^\top + q_{3, 3} c_3^\top = r_3 c_1 + c_2 + c_3 + e_i c_3^\top = x_{i, 3}$.
PIR-rate is $\frac{k}{3k} = \frac{1}{3}$.
## Beyond z=1
### Star-product theme
Given $x=(x_1, \ldots, x_j)_{j\in [n]}, y=(y_1, \ldots, y_j)_{j\in [n]}$, over $\mathbb{F}_q$, the star-product is defined as:
$$
x \star y = (x_1 y_1, \ldots, x_n y_n)
$$
Given two linear codes, $C,D\subseteq \mathbb{F}_q^n$, the star-product code is defined as:
$$
C \star D = span_{\mathbb{F}_q} \{x \star y | x \in C, y \in D\}
$$
Singleton bound for star-product:
$$
d_{C \star D} \leq n-\dim C-\dim D+2
$$
### PIR form a database coded with any MDS code and z>1
To generalize the previous scheme to $z > 1$ need to encode multiple $r$'s together.
- As in the ramp scheme.
> Recall from the ramp scheme, we use $r_1, \ldots, r_z \sim U_{\mathbb{F}_q^k}$ as our key vector to avoid occlusion of the servers.
In the star-product scheme:
- Files are coded with an MDS code $C$.
- The multiple $r$'s are coded with an MDS code $D$.
- The scheme is based on the minimum distance of $C \star D$.
To code the data:
- Let $C \subseteq \mathbb{F}_q^n$ be an MDS code of dimension $k$.
- For all $j \in m$, encode file $x_j = x_{j, 1}, \ldots, x_{j, k}$ using $G_C$:
$$
\begin{pmatrix}
x_{1, 1} & x_{1, 2} & \cdots & x_{1, k}\\
x_{2, 1} & x_{2, 2} & \cdots & x_{2, k}\\
\vdots & \vdots & \ddots & \vdots\\
x_{m, 1} & x_{m, 2} & \cdots & x_{m, k}
\end{pmatrix} \cdot G_C = \begin{pmatrix}
c_{1, 1} & c_{1, 2} & \cdots & c_{1, n}\\
c_{2, 1} & c_{2, 2} & \cdots & c_{2, n}\\
\vdots & \vdots & \ddots & \vdots\\
c_{m, 1} & c_{m, 2} & \cdots & c_{m, n}
\end{pmatrix}
$$
- For all $j \in n$, store $c_j = c_{1, j}, c_{2, j}, \ldots, c_{m, j}$ (a column of the above matrix) in server $j$.
Let $r_1, \ldots, r_z \sim U_{\mathbb{F}_q^k}$.
To code the queries:
- Let $D \subseteq \mathbb{F}_q^k$ be an MDS code of dimension $z$.
- Encode the $r_j$'s using $G_D=[g_1^\top, \ldots, g_z^\top]$.
$$
(r_1^\top, \ldots, r_z^\top) \cdot G_D = \begin{pmatrix}
r_{1, 1} & r_{2, 1} & \cdots & r_{z, 1}\\
r_{1, 2} & r_{2, 2} & \cdots & r_{z, 2}\\
\vdots & \vdots & \ddots & \vdots\\
r_{1, m} & r_{2, m} & \cdots & r_{z, m}
\end{pmatrix}
\cdot G_D=\left((r_1^\top,\ldots, r_z^\top)g_1^\top,\ldots, (r_1^\top,\ldots, r_z^\top)g_n^\top \right)
$$
To introduce the "errors in known locations" to the encoded $r_j$'s:
- Let $W \in \{0, 1\}^{m \times n}$ with some $d_{C \star D} - 1$ entries in its $i$-th row equal to 1.
- These are the entries we will retrieve.
For every server $j \in [n]$ send $q_j = r_1^\top, \ldots, r_z^\top g_j^\top + w_j$, where $w_j$ is the $i$-th column of $W$.
- This is similar to ramp scheme, where $w_j$ is the "message".
- Privacy against collusion of $z$ servers.
Response from server: $a_j = q_j c_j^\top$.
Decoding? Let $Q \in \mathbb{F}_q^{m \times n}$ be a matrix whose columns are the $q_j$'s.
$$
Q = \begin{pmatrix}
r_1^\top & \cdots & r_z^\top
\end{pmatrix} \cdot G_D + W
$$
- The user has
$$
\begin{aligned}
q_1 c_1^\top, \ldots, q_n c_n^\top &= \left(\sum_{j \in m} q_{1, j} c_{j, 1}, \ldots, \sum_{j \in m} q_{n, j} c_{j, n}\right) \\
&=\sum_{j \in m} (q_{1,j}c_{j, 1}, \ldots, q_{n,j}c_{j, n}) \\
&=\sum_{j \in m} q^j \star c^j
$$
where $q^j$ is a row of $Q$ and $c^j$ is a codeword in $C$ (an $n, k$ $q$ MDS code).
We have:
- $Q=(r_1^\top, \ldots, r_z^\top) \cdot G_D + W$
- $W\in \{0, 1\}^{m \times n}$ with some $d_{C \star D} - 1$ entries in its $i$-th row equal to 1.
- $(q^j \star c^j)=sum_{j \in m} q^j \star c^j$
- Each $q^j$ is a row of $Q$
- For $j \neq i$, $q^j$ is a codeword in $D$
- $q^i = d^i + w^i$
- Therefore:
$$
\begin{aligned}
\sum_{j \in [m]} q^j \star c^j &= \sum_{j \neq i} (d^j \star c^j) + ((d^i + w^i) \star c^i) \\
&= \sum_{j \neq i} (d^j \star c^j) + w^i \star c^i
&= (\text{codeword in } C \star D )+( \text{noise of Hamming weight } \leq d_{C \star D} - 1)
\end{aligned}
$$
Multiply by $H_{C \star D}$ and get $d_{C \star D} - 1$ elements of $c^i$.
- Recall that $c^i = x_i \cdot G_C$
- Repeat $k^{d_{C \star D} - 1}$ times to obtain $k$ elements of $c^i$.
- Suffices to obtain $x_i$, since $C$ is $n, k$ $q$ MDS code.
PIR-rate:
- = $\frac{k}{# \text{ downloaded elements}} = \frac{k}{\frac{k}{d_{C \star D} - 1} \cdot n} = \frac{d_{C \star D} - 1}{n}$
- Singleton bound for star-product: $d_{C \star D} \leq n - \dim C - \dim D + 2$.
- Achieved with equality if $C$ and $D$ are Reed-Solomon codes.
- PIR-rate = $\frac{n - \dim C - \dim D + 1}{n} = \frac{n - k - z + 1}{n}$.
- Intuition:
- "paying" $k$ for "reconstruction from any $k$".
- "paying" $z$ for "protection against colluding sets of size $z$".
- Capacity unknown! (as of 2022).
- Known for special cases, e.g., $k = 1, z = 1$, certain types of schemes, etc.
### PIR over graphs
Graph-based replication:
- Every file is replicated twice on two separate servers.
- Every two servers have at most one file in common.
- "file" = "granularity" of data, i.e., the smallest information unit shared by any two servers.
A server that stores $(x_{i, j})_{j=1}^d$ receives $(q_{i, j})_{j=1}^d$, and replies with $\sum_{j=1}^d q_{i, j} \cdot x_{i, j}$.
The idea:
- Consider a 2-server replicated PIR and "split" the queries between the servers.
- Sum the responses, unwanted files "cancel out", while $x_i$ does not.
Problem: Collusion.
Solution: Add per server randomness.
Good for any graph, and any $q \geq 3$ (for simplicity assume $2 | q$).
The protocol:
- Choose random $\gamma \in \mathbb{F}_q^n$, $\nu \in \mathbb{F}_q^m$, and $h \in \mathbb{F} \setminus \{0, 1\}$.
- Queries:
- If node $j$ is incident with edge $\ell$, send $q_{j, \ell} = \gamma_j \cdot \nu_\ell$ to node $j$.
- I.e., if server $j$ stores file $\ell$.
- Except one node $j_0$ that stores $x_i$, which gets $q_{j_0, i} = h \cdot \gamma_{j_0} \cdot \nu_i$.
- Server $j$ responds with $a_j = \sum_{j=1}^d q_{j, \ell} \cdot x_{i, \ell}$.
- Where $x_{i, 1}, \ldots, $x_{i, d}$ are the files adjacent with it.
<details>
<summary>Example</summary>
- Consider the following graph.
- $n = 5, m = 7, and i = 3$.
- $q_3 = \gamma_3 \cdot v_2, v_3, v_6$ and $a_3 = x_2 \cdot \gamma_3 v_2 + x_3 \cdot \gamma_3 v_3 + x_6 \cdot \gamma_3 v_6$.
- $q_2 = \gamma_2 \cdot v_1, h v_3, v_4$ and $a_2 = x_1 \cdot \gamma_2 v_1 + x_3 \cdot h \gamma_2 v_3 + x_4 \cdot \gamma_2 v_4$.
![Example of PIR over graphs](https://notenextra.trance-0.com/CSE5313/PIR_over_graphs.png)
</details>
Correctness:
- $\sum_{j=1}^5 \gamma_j^{-1} a_j =( h + 1 )v_3 x_3$
- $h \neq 1, v_3 \neq 0 \implies$ find $x_3$.
Parameters:
- Storage overhead 2 (for any graph).
- Download $n \cdot k$.
- PIR rate 1/n.
Collusion resistance:
1-privacy: Each node sees an entirely random vector.
2-privacy:
- If no edge as for 1-privacy.
- If edge exists E.g.,
- $\gamma_3 v_6$ and $\gamma_4 v_6$ are independent.
- $\gamma_3 v_3$ and $h \cdot \gamma_2 v_3$ are independent.
S-privacy:
- Let $S \subseteq n$ (e.g., $S = 2,3,5$), and consider the query matrix of their mutual files:
$$
Q_S = diag(\gamma_3, \gamma_2, \gamma_5) \begin{pmatrix} 1 &\\ h & 1 \\ & 1\end{pmatrix} diag(v_3, v_4)
$$
- It can be shown that $Pr(Q_S)=\frac{1}{(q-1)^4}$, regardless of $i \implies$ perfect privacy.

View File

@@ -22,5 +22,6 @@ export default {
CSE5313_L16: "CSE5313 Coding and information theory for data science (Exam Review)", CSE5313_L16: "CSE5313 Coding and information theory for data science (Exam Review)",
CSE5313_L17: "CSE5313 Coding and information theory for data science (Lecture 17)", CSE5313_L17: "CSE5313 Coding and information theory for data science (Lecture 17)",
CSE5313_L18: "CSE5313 Coding and information theory for data science (Lecture 18)", CSE5313_L18: "CSE5313 Coding and information theory for data science (Lecture 18)",
CSE5313_L19: "CSE5313 Coding and information theory for data science (Exam Review)", CSE5313_L19: "CSE5313 Coding and information theory for data science (Lecture 19)",
CSE5313_L20: "CSE5313 Coding and information theory for data science (Lecture 20)",
} }

View File

@@ -1,2 +1,13 @@
# CSE5519 Advances in Computer Vision (Topic C: 2024 - 2025: Neural Rendering) # CSE5519 Advances in Computer Vision (Topic C: 2024 - 2025: Neural Rendering)
## COLMAP-Free 3D Gaussian SplattingLinks to an external site
[link to the paper](https://arxiv.org/pdf/2312.07504)
We propose a novel 3D Gaussian Splatting (3DGS) framework that eliminates the need for COLMAP for camera pose estimation and bundle adjustment.
> [!TIP]
>
> This paper presents a novel 3D Gaussian Splatting framework that eliminates the need for COLMAP for camera pose estimation and bundle adjustment.
>
> Inspired by point map construction, the author uses Gaussian splatting to reconstruct the 3D scene. I wonder how this method might contribute to higher resolution reconstruction or improvements. Can we use the original COLMAP on traditional NeRF methods for comparable results?

View File

@@ -1,2 +1,17 @@
# CSE5519 Advances in Computer Vision (Topic F: 2025: Representation Learning) # CSE5519 Advances in Computer Vision (Topic F: 2025: Representation Learning)
## Can Generative Models Improve Self-Supervised Representation Learning?
[link to the paper](https://arxiv.org/pdf/2403.05966)
### Novelty in SSL with Generative Models
- Use generative models to generate synthetic data to train self-supervised representation learning models.
- Use generative augmentation to generate new data from the original data using a generative model. (with gaussian noise, or other data augmentation techniques)
- Using standard augmentation techniques like flipping, cropping, and color jittering with generative techniques can further improve the performance of the self-supervised representation learning models.
> [!TIP]
>
> This paper shows that using generative models to generate synthetic data can improve the performance of self-supervised representation learning models. The key seems to be the use of generative augmentation to generate new data from the original data using a generative model.
>
> However, both representation learning and generative modeling have some hallucinations. I wonder will these kinds of hallucinations would be reinforced, and the bias in the generation model would propagate to the representation learning model in the process of generative augmentation?

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB