part2 complete

2025-11-02 12:55:49 -06:00
parent ac986ec69a
commit 248051db0d
36 changed files with 680 additions and 953 deletions
--- a/result.tex
+++ b/result.tex
@@ -159,6 +159,67 @@ What is not used:
        \end{itemize}
    \end{enumerate}

+    \newpage
+    \item [2.3] Deliveries
+    \begin{enumerate}
+        \item [2.3.1] Plot a learning curve for the baseline loss. (5 pts)
+
+        \begin{figure}[H]
+            \centering
+            \includegraphics[width=0.8\textwidth]{images/p231.png}
+            \caption{Learning Curve for Baseline Loss for Batch Size of 5000}
+        \end{figure}
+        
+        \item [2.3.2] Plot a learning curve for the evaluation return. You should expect to converge to the maximum reward of 500. (15 pts)
+        
+        \begin{figure}[H]
+            \centering
+            \includegraphics[width=0.8\textwidth]{images/p232.png}
+            \caption{Learning Curve for Evaluation Return for Batch Size of 5000}
+        \end{figure}
+
+        \item [2.3.3]
+        Run another experiment with a decreased number of baseline gradient steps (-bgs in command line) and/or baseline learning rate (-blr in command line). How does this affect (a) the baseline learning curve and (b) the performance of the policy? (15 pts)
+
+        \begin{figure}[H]
+            \centering
+            \includegraphics[width=0.8\textwidth]{images/p2331.png}
+            \caption{Learning Curve for Baseline Loss for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
+        \end{figure}
+
+        In general, the baseline learning curve is more stable and the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
+
+        \begin{figure}[H]
+            \centering
+            \includegraphics[width=0.8\textwidth]{images/p2332.png}
+            \caption{Learning Curve for Average Return for Batch Size of 5000 with Decreased Baseline Gradient Steps and/or Baseline Learning Rate}
+        \end{figure}
+
+        In general, the performance of the policy is better when the number of baseline gradient steps is decreased and/or the baseline learning rate is decreased.
+        
+        \item [2.3.4]
+        How does the command line argument -na influence the performance? Why is that the case? (5 pts)
+
+        The performance of the policy is better when the command line argument -na is used.
+
+        The command line argument -na helps the performance of the policy by normalizing the advantages, which helps the policy to learn more stable and faster.
+
+        \begin{figure}[H]
+            \centering
+            \includegraphics[width=0.8\textwidth]{images/p234.png}
+            \caption{Learning Curve for Average Return for Batch Size of 5000 with Command Line Argument -na}
+        \end{figure}
+
+    \end{enumerate}
+    \newpage
+    \item [2.4] Bonus (20pt)
+    
+    % \begin{figure}[H]
+    %     \centering
+    %     \includegraphics[width=0.8\textwidth]{images/p241.png}
+    %     \caption{Learning Curve for Average Return for HalfCheetah with Berkely Parameters}
+    % \end{figure}
+
 \end{enumerate}

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%