update notations
This commit is contained in:
@@ -27,7 +27,7 @@ $\theta_{new}=\theta_{old}+d$
|
||||
First order Taylor expansion for the loss and second order for the KL:
|
||||
|
||||
$$
|
||||
\approx \arg\max_{d} J(\theta_{old})+\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d-\frac{1}{2}\lambda(d^T\nabla_\theta^2 D_{KL}\left[\pi_{\theta_{old}}||\pi_{\theta}\right]\mid_{\theta=\theta_{old}}d)+\lambda \delta
|
||||
\approx \arg\max_{d} J(\theta_{old})+\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d-\frac{1}{2}\lambda(d^\top\nabla_\theta^2 D_{KL}\left[\pi_{\theta_{old}}||\pi_{\theta}\right]\mid_{\theta=\theta_{old}}d)+\lambda \delta
|
||||
$$
|
||||
|
||||
If you are really interested, try to fill the solving the KL Constrained Problem section.
|
||||
@@ -38,7 +38,7 @@ Setting the gradient to zero:
|
||||
|
||||
$$
|
||||
\begin{aligned}
|
||||
0&=\frac{\partial}{\partial d}\left(-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d+\frac{1}{2}\lambda(d^T F(\theta_{old})d\right)\\
|
||||
0&=\frac{\partial}{\partial d}\left(-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d+\frac{1}{2}\lambda(d^\top F(\theta_{old})d\right)\\
|
||||
&=-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}+\frac{1}{2}\lambda F(\theta_{old})d
|
||||
\end{aligned}
|
||||
$$
|
||||
@@ -58,15 +58,15 @@ $$
|
||||
$$
|
||||
|
||||
$$
|
||||
D_{KL}(\pi_{\theta_{old}}||\pi_{\theta})\approx \frac{1}{2}(\theta-\theta_{old})^T F(\theta_{old})(\theta-\theta_{old})
|
||||
D_{KL}(\pi_{\theta_{old}}||\pi_{\theta})\approx \frac{1}{2}(\theta-\theta_{old})^\top F(\theta_{old})(\theta-\theta_{old})
|
||||
$$
|
||||
|
||||
$$
|
||||
\frac{1}{2}(\alpha g_N)^T F(\alpha g_N)=\delta
|
||||
\frac{1}{2}(\alpha g_N)^\top F(\alpha g_N)=\delta
|
||||
$$
|
||||
|
||||
$$
|
||||
\alpha=\sqrt{\frac{2\delta}{g_N^T F g_N}}
|
||||
\alpha=\sqrt{\frac{2\delta}{g_N^\top F g_N}}
|
||||
$$
|
||||
|
||||
However, due to the quadratic approximation, the KL constrains may be violated.
|
||||
|
||||
Reference in New Issue
Block a user