part2 complete
This commit is contained in:
61
result.tex
61
result.tex
@@ -159,6 +159,67 @@ What is not used:
|
||||
\end{itemize}
|
||||
\end{enumerate}
|
||||
|
||||
\newpage
|
||||
\item [2.3] Deliveries
|
||||
\begin{enumerate}
|
||||
\item [2.3.1] Plot a learning curve for the baseline loss. (5 pts)
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{images/p231.png}
|
||||
\caption{Learning Curve for Baseline Loss for Batch Size of 5000}
|
||||
\end{figure}
|
||||
|
||||
\item [2.3.2] Plot a learning curve for the evaluation return. You should expect to converge to the maximum reward of 500. (15 pts)
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{images/p232.png}
|
||||
\caption{Learning Curve for Evaluation Return for Batch Size of 5000}
|
||||
\end{figure}
|
||||
|
||||
\item [2.3.3]
|
||||
Run another experiment with a decreased number of baseline gradient steps (-bgs in command line) and/or baseline learning rate (-blr in command line). How does this affect (a) the baseline learning curve and (b) the performance of the policy? (15 pts)
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{images/p2331.png}
|
||||
\caption{Learning Curve for Baseline Loss for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
|
||||
\end{figure}
|
||||
|
||||
In general, the baseline learning curve is more stable and the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{images/p2332.png}
|
||||
\caption{Learning Curve for Average Return for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
|
||||
\end{figure}
|
||||
|
||||
In general, the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
|
||||
|
||||
\item [2.3.4]
|
||||
How does the command line argument -na influence the performance? Why is that the case? (5 pts)
|
||||
|
||||
The performance of the policy is better when the command line argument -na is used.
|
||||
|
||||
The command line argument -na helps the performance of the policy by normalizing the advantages, which helps the policy to learn more stable and faster.
|
||||
|
||||
\begin{figure}[H]
|
||||
\centering
|
||||
\includegraphics[width=0.8\textwidth]{images/p234.png}
|
||||
\caption{Learning Curve for Average Return for Batch Size of 5000 with Command Line Argument -na}
|
||||
\end{figure}
|
||||
|
||||
\end{enumerate}
|
||||
\newpage
|
||||
\item [2.4] Bonus (20pt)
|
||||
|
||||
% \begin{figure}[H]
|
||||
% \centering
|
||||
% \includegraphics[width=0.8\textwidth]{images/p241.png}
|
||||
% \caption{Learning Curve for Average Return for HalfCheetah with Berkely Parameters}
|
||||
% \end{figure}
|
||||
|
||||
\end{enumerate}
|
||||
|
||||
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
||||
|
||||
Reference in New Issue
Block a user