part2 complete

This commit is contained in:
Zheyuan Wu
2025-11-02 12:55:49 -06:00
parent ac986ec69a
commit 248051db0d
36 changed files with 680 additions and 953 deletions

View File

@@ -159,6 +159,67 @@ What is not used:
\end{itemize}
\end{enumerate}
\newpage
\item [2.3] Deliveries
\begin{enumerate}
\item [2.3.1] Plot a learning curve for the baseline loss. (5 pts)
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{images/p231.png}
\caption{Learning Curve for Baseline Loss for Batch Size of 5000}
\end{figure}
\item [2.3.2] Plot a learning curve for the evaluation return. You should expect to converge to the maximum reward of 500. (15 pts)
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{images/p232.png}
\caption{Learning Curve for Evaluation Return for Batch Size of 5000}
\end{figure}
\item [2.3.3]
Run another experiment with a decreased number of baseline gradient steps (-bgs in command line) and/or baseline learning rate (-blr in command line). How does this affect (a) the baseline learning curve and (b) the performance of the policy? (15 pts)
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{images/p2331.png}
\caption{Learning Curve for Baseline Loss for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
\end{figure}
In general, the baseline learning curve is more stable and the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{images/p2332.png}
\caption{Learning Curve for Average Return for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
\end{figure}
In general, the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
\item [2.3.4]
How does the command line argument -na influence the performance? Why is that the case? (5 pts)
The performance of the policy is better when the command line argument -na is used.
The command line argument -na helps the performance of the policy by normalizing the advantages, which helps the policy to learn more stable and faster.
\begin{figure}[H]
\centering
\includegraphics[width=0.8\textwidth]{images/p234.png}
\caption{Learning Curve for Average Return for Batch Size of 5000 with Command Line Argument -na}
\end{figure}
\end{enumerate}
\newpage
\item [2.4] Bonus (20pt)
% \begin{figure}[H]
% \centering
% \includegraphics[width=0.8\textwidth]{images/p241.png}
% \caption{Learning Curve for Average Return for HalfCheetah with Berkely Parameters}
% \end{figure}
\end{enumerate}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%