From 535b9329f01cde6b316a0f6a90226ecfe997efb8 Mon Sep 17 00:00:00 2001
From: Trance-0 <60459821+Trance-0@users.noreply.github.com>
Date: Fri, 3 Oct 2025 10:54:02 -0500
Subject: [PATCH 1/4] updates
---
content/CSE510/CSE510_L12.md | 204 +++++++++++++++++++++++++++++++
content/CSE510/_meta.js | 1 +
content/CSE5313/CSE5313_L11.md | 166 +++++++++++++++++++++++++
content/CSE5313/_meta.js | 1 +
content/Math4201/Math4201_L17.md | 2 +
content/Math4201/_meta.js | 1 +
6 files changed, 375 insertions(+)
create mode 100644 content/CSE510/CSE510_L12.md
create mode 100644 content/CSE5313/CSE5313_L11.md
create mode 100644 content/Math4201/Math4201_L17.md
diff --git a/content/CSE510/CSE510_L12.md b/content/CSE510/CSE510_L12.md
new file mode 100644
index 0000000..8ad0405
--- /dev/null
+++ b/content/CSE510/CSE510_L12.md
@@ -0,0 +1,204 @@
+# CSE510 Deep Reinforcement Learning (Lecture 12)
+
+## Policy Gradient Theorem
+
+For any differentiable policy $\pi_\theta(s,a)$, for any o the policy objective functions $J=J_1, J_{avR}$ or $\frac{1}{1-\gamma} J_{avV}$
+
+The policy gradient is
+
+$$
+\nabla_{\theta}J(\theta)=\mathbb{E}_{\pi_{\theta}}\left[\nabla_\theta \log \pi_\theta(s,a)Q^{\pi_\theta}(s,a)\right]
+$$
+
+## Policy Gradient Methods
+
+Advantages of Policy-Based RL
+
+Advantages:
+
+- Better convergence properties
+- Effective in high-dimensional or continuous action spaces
+- Can learn stochastic policies
+
+Disadvantages:
+
+- Typically converge to a local rather than global optimum
+- Evaluating a policy is typically inefficient and high variance
+
+### Anchor-Critic Methods
+
+#### Q Actor-Critic
+
+Reducing Variance Using a Critic
+
+Monte-Carlo Policy Gradient still has high variance.
+
+We use a critic to estimate the action-value function $Q_w(s,a)\approx Q^{\pi_\theta}(s,a)$.
+
+Anchor-critic algorithms maintain two sets of parameters:
+
+Critic: updates action-value function parameters $w$
+
+Actor: updates policy parameters $\theta$, in direction suggested by the critic.
+
+Actor-critic algorithms follow an approximate policy gradient:
+
+$$
+\nabla_\theta J(\theta) \approx \mathbb{E}_{\pi_{\theta}}\left[\nabla_\theta \log \pi_\theta(s,a)Q_w(s,a)\right]
+$$
+$$
+\Delta \theta = \alpha \nabla_\theta \log \pi_\theta(s,a)Q_w(s,a)
+$$
+
+Action-Value Actor-Critic
+
+- Simple actor-critic algorithm based on action-value critic
+- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
+
+Critic: updates $w$ by linear $TD(0)$
+Actor: updates $\theta$ by policy gradient
+
+```python
+def Q_actor-critic(states,theta):
+ actions=sample_actions(a,pi_theta)
+ for i in range(num_steps):
+ reward=sample_rewards(actions,states)
+ transition=sample_transition(actions,states)
+ new_actions=sample_action(transition,theta)
+ delta=sample_reward+gamma*Q_w(transition, new_actions)-Q_w(states, actions)
+ theta=theta+alpha*nabla_theta*log(pi_theta(states, actions))*Q_w(states, actions)
+ w=w+beta*delta*phi(states, actions)
+ a=new_actions
+ s=transition
+```
+
+#### Advantage Actor-Critic
+
+Reducing variance using a baseline
+
+- We subtract a baseline function $B(s)$ form the policy gradient
+- This can reduce the variance without changing expectation
+
+$$
+\begin{aligned}
+\mathbb{E}_{\pi_\theta}\left[\nabla_\theta\log \pi_\theta(s,a)B(s)]&=\sum_{s\in S}d^{\pi_\theta}(s)\sum_{a\in A}\nabla_{\theta}\pi_\theta(s,a)B(s)\\
+&=\sum_{s\in S}d^{\pi_\theta}B(s)\nabla_\theta\sum_{a\in A}\pi_\theta(s,a)\\
+&=0
+\end{aligned}
+$$
+
+A good baseline is the state value function $B(s)=V^{\pi_\theta}(s)$
+
+So we can rewrite the policy gradient using the advantage function $A^{\pi_\theta}(s,a)=Q^{\pi_\theta}(s,a)-V^{\pi_theta}(s)$
+
+$$
+\nabla_\theta J(\theta)=\mathbb{E}\left[\nabla_\theta \log \pi_\theta(s,a) A^{\pi_theta}(s,a)\right]
+$$
+
+##### Estimating the Advantage function
+
+**Method 1:** direct estimation
+
+> May increase the variance
+
+The advantage function can significantly reduce variance of policy gradient
+
+So the critic should really estimate the advantage function
+
+For example, by estimating both $V^{\pi_theta}(s)$ and $Q^{\pi_theta}(s,a)$
+
+Using two function approximators and two parameter vectors,
+
+$$
+V_v(s)\approx V^{\pi_\theta}(s)\\
+Q_w(s,a)\approx Q^{\pi_\theta}(s,a)\\
+A(s,a)=Q_w(s,a)-V_v(s)
+$$
+
+And updating both value functions by e.g. TD learning
+
+**Method 2:** using the TD error
+
+> We can prove that TD error is an unbiased estimation of the advantage function
+
+For the true value function $V^{\pi_\theta}(s)$, the TD error $\delta^{\pi_\theta}$
+
+$$
+\delta^{\pi_\theta} = r + \gamma V^{\pi_\theta}(s) - V^{\pi_\theta}(s)
+$$
+
+is an unbiased estimate of the advantage function
+
+$$
+\begin{aligned}
+\mathbb{E}_{\pi_\theta}[\delta^{\pi_\theta}| s,a]&=\mathbb{E}_{\pi_\theta}[r + \gamma V^{\pi_\theta}(s') |s,a]-V^{\pi_\theta}(s)\\
+&=Q^{\pi_\theta}(s,a)-V^{\pi_\theta}(s)\\
+&=A^{\pi_\theta}(s,a)
+\end{aligned}
+$$
+
+So we can use the TD error to compute the policy gradient
+
+$$
+\Delta \theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) \delta^{\pi_\theta}]
+$$
+
+In practice, we can use an approximate TD error $\delta_v=r+\gamma V_v(s')-V_v(s)$ to compute the policy gradient
+
+### Summary of policy gradient algorithms
+
+THe policy gradient has many equivalent forms.
+
+$$
+\begin{aligned}
+\nabla_\theta J(\theta) &= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) v_t] \text{ REINFORCE} \\
+&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q_w(s,a)] \text{ Q Actor-Critic} \\
+&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) A^{\pi_\theta}(s,a)] \text{ Advantage Actor-Critic} \\
+&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) \delta^{\pi_\theta}] \text{ TD Actor-Critic}
+\end{aligned}
+$$
+
+Each leads s stochastic gradient ascent algorithm.
+
+Critic use policy evaluation to estimate the $Q^\pi(s,a)$ or $A^\pi(s,a)$ or $V^\pi(s)$.
+
+## Compatible Function Approximation
+
+If the following two conditions are satisfied:
+
+1. Value function approximation is a compatible with the policy
+ $$
+ \nabla_w Q_w(s,a) = \nabla_\theta \log \pi_\theta(s,a)
+ $$
+2. Value function parameters $w$ minimize the MSE
+ $$
+ \epsilon = \mathbb{E}_{\pi_\theta}[(Q^{\pi_\theta}(s,a)-Q_w(s,a))^2]
+ $$
+ Note $\epsilon$ need not be zero, just need to be minimized.
+
+Then the policy gradient is exact
+
+$$
+\nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q_w(s,a)]
+$$
+
+Remember:
+
+$$
+\nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q^{\pi_\theta}(s,a)]
+$$
+
+### Challenges with Policy Gradient Methods
+
+- Data Inefficiency
+ - On-policy method: for each new policy, we need to generate a completely new
+ - trajectory
+ - The data is thrown out after just one gradient update
+ - As complex neural networks need many updates, this makes the training process very slow
+- Unstable update: step size is very important
+ - If step size is too large:
+ - Large step -> bad policy
+ - Next batch is generated from current bad policy -> collect bad samples
+ - Bad samples -> worse policy (compare to supervised learning: the correct label and data in the following batches may correct it)
+ - If step size is too small: the learning process is slow
+
diff --git a/content/CSE510/_meta.js b/content/CSE510/_meta.js
index aa4ef7e..be90a9e 100644
--- a/content/CSE510/_meta.js
+++ b/content/CSE510/_meta.js
@@ -14,4 +14,5 @@ export default {
CSE510_L9: "CSE510 Deep Reinforcement Learning (Lecture 9)",
CSE510_L10: "CSE510 Deep Reinforcement Learning (Lecture 10)",
CSE510_L11: "CSE510 Deep Reinforcement Learning (Lecture 11)",
+ CSE510_L12: "CSE510 Deep Reinforcement Learning (Lecture 12)"
}
\ No newline at end of file
diff --git a/content/CSE5313/CSE5313_L11.md b/content/CSE5313/CSE5313_L11.md
new file mode 100644
index 0000000..287eefc
--- /dev/null
+++ b/content/CSE5313/CSE5313_L11.md
@@ -0,0 +1,166 @@
+# CSE5313 Coding and information theory for data science (Recitation 10)
+
+## Question 5
+
+Prove the minimum distance of Reed-Muller code $RM(r,m)$ is $2^{m-r}$.
+
+$n=2^m$.
+
+Recall that the definition of RM code is:
+
+$$
+\operatorname{RM}(r,m)=\left\{(f(\alpha_1),\ldots,f(\alpha_2^m))|\alpha_i\in \mathbb{F}_2^m,\deg f\leq r\right\}
+$$
+
+
+Example of RM code
+
+Let $r=0$, it is the repetition code.
+
+$\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}$.
+
+Here $r=0$, so $\dim \operatorname{RM}(0,m)=1$.
+
+So the minimum distance of $RM(0,m)$ is $2^{m-0}=n$.
+
+---
+
+Let $r=m$,
+
+then $\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}=2^m$. (binomial theorem)
+
+So the generator matrix is $n\times n$
+
+So the minimum distance of $RM(m,m)$ is $2^{m-m}=1$.
+
+
+Then we can do the induction on $r$.
+
+Assume the minimum distance of $RM(r',m')$ is $2^{m'-r'}$ for all $0\leq r'\leq r$, $r'\leq m'
+Proof
+
+Recall that the polynomial $p(x_1,x_2,\ldots,x_m)$ can be written as $p(x_1,x_2,\ldots,x_m)=\sum_{S\subseteq [m],|S|\leq r}f_s X_s$, where $f_s\in \mathbb{F}_2$, the monomial $X_s=\prod_{i\in S}x_i$.
+
+Every monomial $f(x_1,x_2,\ldots,x_m)$ can be written as
+
+$$
+\begin{aligned}
+p(x_1,x_2,\ldots,x_m)&=\sum_{S\subseteq [m],|S|\leq r}f_s X_s\\
+&=g(x_1,x_2,\ldots,x_{m-1})+x_m h(x_1,x_2,\ldots,x_{m-1})\\
+\end{aligned}
+$$
+
+So $g(x_1,x_2,\ldots,x_{m-1})$ has degree at most $r$ and does not contain $x_m$.
+
+And $x_m h(x_1,x_2,\ldots,x_{m-1})$ has degree at most $r-1$ and contains $x_m$.
+
+Note that the codeword of $RM(r,m)$ is the truth table of some monomial evaluated at all $2^m$ $\alpha_i\in \mathbb{F}_2^m$.
+
+And the minimum distance of $RM(r,m)$ is the minimum hamming weight for linear code, which is the number of $\alpha_i$ such that $f(\alpha_i)=1$
+
+Then we can defined the weight of $f$ to be all $\alpha_i$ such that $f(\alpha_i)=1$.
+
+$$
+\operatorname{wt}(f)=\{\alpha_i|f(\alpha_i)=1\}
+$$
+
+Note that $g(x_1,x_2,\ldots,x_{m-1})$ is a $RM(r,m-1)$ and $h(x_1,x_2,\ldots,x_{m-1})$ is a $RM(r-1,m-1)$.
+
+If $x_m=0$, then $f(\alpha_i)=g(\alpha_i)$.
+If $x_m=1$, then $f(\alpha_i)=g(\alpha_i)+h(\alpha_i)$.
+
+So $\operatorname{wt}(f)=\operatorname{wt}(g)\cup\operatorname{wt}(g+h)$.
+
+Note that $\operatorname{wt}(g+h)$ is the number of $\alpha_i$ such that $g(\alpha_i)+h(\alpha_i)=1$, which is `XOR` in binary field.
+
+So $\operatorname{wt}(g+h)=(\operatorname{wt}(g)\setminus\operatorname{wt}(h))\cup (\operatorname{wt}(h)\setminus\operatorname{wt}(g))$.
+
+So
+
+$$
+\begin{aligned}
+|\operatorname{wt}(f)|&=|\operatorname{wt}(g)|+|\operatorname{wt}(g+h)|\\
+&=|\operatorname{wt}(g)|+|\operatorname{wt}(g)\setminus\operatorname{wt}(h)|+|\operatorname{wt}(h)\setminus\operatorname{wt}(g)|\\
+&=|\operatorname{wt}(h)|+2|\operatorname{wt}(h)\setminus\operatorname{wt}(g)|\\
+\end{aligned}
+$$
+
+Note $h$ is in $\operatorname{RM}(r-1,m-1)$, so $|\operatorname{wt}(h)|=2^{m-r}$
+
+
+
+## Theorem for Reed-Muller code
+
+$$
+\operatorname{RM}(r,m)^\perp=\operatorname{RM}(m-r-1,m)
+$$
+
+Let $\mathcal{C}=[n,k,d]_q$.
+
+The dual code of $\mathcal{C}$ is $\mathcal{C}^\perp=\{x\in \mathbb{F}^n_q|xc^T=0\text{ for all }c\in \mathcal{C}\}$.
+
+
+Example
+
+$\operatorname{RM}(0,m)^\perp=\operatorname{RM}(m-1,m)$.
+
+and $\operatorname{RM}(0,m)$ is the repetition code.
+
+which is the dual of the parity code $\operatorname{RM}(m-1,m)$.
+
+
+
+### Lemma for sum of binary product
+
+For $A\subseteq [m]=\{1,2,\ldots,m\}$, let $X^A=\prod_{i\in A}x_i$, we can defined the inner product $\langle X^A,X^B\rangle=\sum_{x\in \{0,1\}^m}\prod_{i\in A}x_i\prod_{i\in B}x_i=\sum_{x\in \{0,1\}^m}\prod_{i\in A\cup B}x_i$.
+
+So $\langle X^A,X^B\rangle=\begin{cases}
+1 & \text{if }A\cup B=[m]\\
+0 & \text{otherwise}
+\end{cases}$
+
+because $\prod_{i\in A\cup B}x_i=1$ if every coordinate in $A\cup B$ is 1.
+
+So the number of such $x\in \{0,1\}^m$ is $2^{m-|A\cup B|}$.
+
+This implies that $\langle X^A,X^B\rangle=1$ if and only if $m-|A\cup B|=0$.
+
+Recall that $\operatorname{RM}(r,m)$ is the evaluation of $f=\sum_{B\subseteq [m],|B|\leq r}\beta X^B$ at all $\beta_i\in \{0,1\}^m$.
+
+$\operatorname{RM}(m-r-1,m)$ is the evaluation of $h=\sum_{A\subseteq [m],|A|\leq m-r-1}\alpha X^A$ at all $\alpha_i \in \{0,1\}^m$.
+
+By linearity of inner product, we have
+
+$$
+\begin{aligned}
+\langle f,h\rangle&=\langle \sum_{B\subseteq [m],|B|\leq r}\beta X^B,\sum_{A\subseteq [m],|A|\leq m-r-1}\alpha X^A\rangle\\
+&=\sum_{B\subseteq [m],|B|\leq r}\sum_{A\subseteq [m],|A|\leq m-r-1}\beta\alpha\langle X^B,X^A\rangle\\
+\end{aligned}
+$$
+
+Because $|A\cup B|\leq |A|+|B|\leq m-r-1+r=m-1$.
+
+So $\langle X^B,X^A\rangle=0$ since $m-1
+Proof for the theorem
+
+Recall that the dual code of $\operatorname{RM}(r,m)^\perp=\{x\in \mathbb{F}_2^m|xc^T=0\text{ for all }c\in \operatorname{RM}(r,m)\}$.
+
+So $\operatorname{RM}(m-r-1,m)\subseteq \operatorname{RM}(r,m)^\perp$.
+
+So the last step is the dimension check.
+
+Since $\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}$ and the dimension of the dual code is $2^m-\dim \operatorname{RM}(r,m)=\sum_{i=0}^{m}\binom{m}{i}-\sum_{i=0}^{r}\binom{m}{i}=\sum_{i=r+1}^{m}\binom{m}{i}$.
+
+Since $\binom{m}{i}=\binom{m}{m-i}$, we have $\sum_{i=r+1}^{m}\binom{m}{i}=\sum_{i=r+1}^{m}\binom{m}{m-i}=\sum_{i=0}^{m-r-1}\binom{m}{i}$.
+
+This is exactly the dimension of $\operatorname{RM}(m-r-1,m)$.
+
+
\ No newline at end of file
diff --git a/content/CSE5313/_meta.js b/content/CSE5313/_meta.js
index 3ca12dc..b2a3452 100644
--- a/content/CSE5313/_meta.js
+++ b/content/CSE5313/_meta.js
@@ -13,4 +13,5 @@ export default {
CSE5313_L8: "CSE5313 Coding and information theory for data science (Lecture 8)",
CSE5313_L9: "CSE5313 Coding and information theory for data science (Lecture 9)",
CSE5313_L10: "CSE5313 Coding and information theory for data science (Recitation 10)",
+ CSE5313_L11: "CSE5313 Coding and information theory for data science (Recitation 11)",
}
\ No newline at end of file
diff --git a/content/Math4201/Math4201_L17.md b/content/Math4201/Math4201_L17.md
new file mode 100644
index 0000000..9f376c2
--- /dev/null
+++ b/content/Math4201/Math4201_L17.md
@@ -0,0 +1,2 @@
+# Math4201 Topology I (Lecture 17)
+
diff --git a/content/Math4201/_meta.js b/content/Math4201/_meta.js
index bf60a58..c209da5 100644
--- a/content/Math4201/_meta.js
+++ b/content/Math4201/_meta.js
@@ -19,4 +19,5 @@ export default {
Math4201_L14: "Topology I (Lecture 14)",
Math4201_L15: "Topology I (Lecture 15)",
Math4201_L16: "Topology I (Lecture 16)",
+ Math4201_L17: "Topology I (Lecture 17)",
}
From 3483618c073645304791bf8907eaa68b9a211369 Mon Sep 17 00:00:00 2001
From: Trance-0 <60459821+Trance-0@users.noreply.github.com>
Date: Fri, 3 Oct 2025 11:52:56 -0500
Subject: [PATCH 2/4] Update Math4201_L17.md
---
content/Math4201/Math4201_L17.md | 85 ++++++++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/content/Math4201/Math4201_L17.md b/content/Math4201/Math4201_L17.md
index 9f376c2..fa181d8 100644
--- a/content/Math4201/Math4201_L17.md
+++ b/content/Math4201/Math4201_L17.md
@@ -1,2 +1,87 @@
# Math4201 Topology I (Lecture 17)
+## Quotient topology
+
+How can we define topologies on the space obtained points in a topological space?
+
+### Quotient map
+
+Let $(X,\mathcal{T})$ be a topological space. $X^*$ is a set and $q:X\to X^*$ is a surjective map.
+
+The quotient topology on $X^*$ is defined as follows:
+
+$$
+\mathcal{T}^* = \{U\subseteq X^*\mid q^{-1}(U)\in \mathcal{T}\}
+$$
+
+$U\subseteq X^*$ is open if and only if $q^{-1}(U)$ is open in $X$.
+
+In particular, $q$ is continuous map.
+
+#### Definition of quotient map
+
+$q:X\to X^*$ defined above is called a **quotient map**.
+
+#### Definition of quotient space
+
+$(X^*,\mathcal{T}^*)$ is called the **quotient space** of $X$ by $q$.
+
+### Typical way of constructing a surjective map
+
+#### Equivalence relation
+
+$\sim$ is a subset of $X\times X$ satisfying:
+
+- reflexive: $\forall x\in X, x\sim x$
+- symmetric: $\forall x,y\in X, x\sim y\implies y\sim x$
+- transitive: $\forall x,y,z\in X, x\sim y\text{ and } y\sim z\implies x\sim z$
+
+#### Equivalence classes
+
+Check equivalence relation.
+
+For $x\in X$, the equivalence class of $x$ is denoted as $[x]\coloneqq \{y\in X\mid y\sim x\}$.
+
+$X^*$ is the set of all equivalence classes on $X$.
+
+$q:X\to X^*$ is defined as $q(x)=[x]$ will be a surjective map.
+
+
+Example of surjective maps and their quotient spaces
+
+Let $X=\mathbb{R}^2$ and $(s,t)\sim (s',t')$ if and only if $s-s'$ and $t-t'$ are both integers.
+
+This space as a topological space is homeomorphic to the torus.
+
+---
+
+Let $X=\{(s,t)\in \mathbb{R}^2\mid s^2+t^2\leq 1\}$ and $(s,t)\sim (s',t')$ if and only if $s^2+t^2$ and $s'^2+t'^2$. with subspace topology as a subspace of $\mathbb{R}^2$.
+
+This space as a topological space is homeomorphic to the spherical shell $S^2$.
+
+
+
+We will show that the quotient topology is a topology on $X^*$.
+
+
+Proof
+
+We need to show that the quotient topology is a topology on $X^*$.
+
+1. $\emptyset, X^*$ are open in $X^*$.
+
+$\emptyset, X^*$ are open in $X^*$ because $q^{-1}(\emptyset)=q^{-1}(X^*)=\emptyset$ and $q^{-1}(X^*)=X$ are open in $X$.
+
+2. $\mathcal{T}^*$ is closed with respect to arbitrary unions.
+
+$$
+q^{-1}(\bigcup_{\alpha \in I} U_\alpha)=\bigcup_{\alpha \in I} q^{-1}(U_\alpha)
+$$
+
+3. $\mathcal{T}^*$ is closed with respect to finite intersections.
+
+$$
+q^{-1}(\bigcap_{\alpha \in I} U_\alpha)=\bigcap_{\alpha \in I} q^{-1}(U_\alpha)
+$$
+
+
\ No newline at end of file
From 21aa31a960a90ff6ee3a8ac116c8ac1dc73d1409 Mon Sep 17 00:00:00 2001
From: Trance-0 <60459821+Trance-0@users.noreply.github.com>
Date: Sun, 5 Oct 2025 17:09:12 -0400
Subject: [PATCH 3/4] updates
---
content/Math4111/Exam_reviews/Math4111_E3.md | 2 +-
content/Math4111/index.md | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/content/Math4111/Exam_reviews/Math4111_E3.md b/content/Math4111/Exam_reviews/Math4111_E3.md
index d6174dc..8bb12b8 100644
--- a/content/Math4111/Exam_reviews/Math4111_E3.md
+++ b/content/Math4111/Exam_reviews/Math4111_E3.md
@@ -1,4 +1,4 @@
-# Exam 3 Review session
+# Math 4111 Exam 3 review
## Relations between series and topology (compactness, closure, etc.)
diff --git a/content/Math4111/index.md b/content/Math4111/index.md
index 8ecfe39..da44270 100644
--- a/content/Math4111/index.md
+++ b/content/Math4111/index.md
@@ -12,10 +12,10 @@ Topics include:
The course is taught by [Alan Chang](https://math.wustl.edu/pQEDle/alan-chang).
-It is easy in my semester perhaps, it is the first course I got 3 perfect scores in exams. (Unfortunately, I did not get the extra credit for the third midterm exam.)
-