This commit is contained in:
Zheyuan Wu
2025-10-08 20:06:05 -05:00
9 changed files with 578 additions and 3 deletions

View File

@@ -0,0 +1,204 @@
# CSE510 Deep Reinforcement Learning (Lecture 12)
## Policy Gradient Theorem
For any differentiable policy $\pi_\theta(s,a)$, for any o the policy objective functions $J=J_1, J_{avR}$ or $\frac{1}{1-\gamma} J_{avV}$
The policy gradient is
$$
\nabla_{\theta}J(\theta)=\mathbb{E}_{\pi_{\theta}}\left[\nabla_\theta \log \pi_\theta(s,a)Q^{\pi_\theta}(s,a)\right]
$$
## Policy Gradient Methods
Advantages of Policy-Based RL
Advantages:
- Better convergence properties
- Effective in high-dimensional or continuous action spaces
- Can learn stochastic policies
Disadvantages:
- Typically converge to a local rather than global optimum
- Evaluating a policy is typically inefficient and high variance
### Anchor-Critic Methods
#### Q Actor-Critic
Reducing Variance Using a Critic
Monte-Carlo Policy Gradient still has high variance.
We use a critic to estimate the action-value function $Q_w(s,a)\approx Q^{\pi_\theta}(s,a)$.
Anchor-critic algorithms maintain two sets of parameters:
Critic: updates action-value function parameters $w$
Actor: updates policy parameters $\theta$, in direction suggested by the critic.
Actor-critic algorithms follow an approximate policy gradient:
$$
\nabla_\theta J(\theta) \approx \mathbb{E}_{\pi_{\theta}}\left[\nabla_\theta \log \pi_\theta(s,a)Q_w(s,a)\right]
$$
$$
\Delta \theta = \alpha \nabla_\theta \log \pi_\theta(s,a)Q_w(s,a)
$$
Action-Value Actor-Critic
- Simple actor-critic algorithm based on action-value critic
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
Critic: updates $w$ by linear $TD(0)$
Actor: updates $\theta$ by policy gradient
```python
def Q_actor-critic(states,theta):
actions=sample_actions(a,pi_theta)
for i in range(num_steps):
reward=sample_rewards(actions,states)
transition=sample_transition(actions,states)
new_actions=sample_action(transition,theta)
delta=sample_reward+gamma*Q_w(transition, new_actions)-Q_w(states, actions)
theta=theta+alpha*nabla_theta*log(pi_theta(states, actions))*Q_w(states, actions)
w=w+beta*delta*phi(states, actions)
a=new_actions
s=transition
```
#### Advantage Actor-Critic
Reducing variance using a baseline
- We subtract a baseline function $B(s)$ form the policy gradient
- This can reduce the variance without changing expectation
$$
\begin{aligned}
\mathbb{E}_{\pi_\theta}\left[\nabla_\theta\log \pi_\theta(s,a)B(s)]&=\sum_{s\in S}d^{\pi_\theta}(s)\sum_{a\in A}\nabla_{\theta}\pi_\theta(s,a)B(s)\\
&=\sum_{s\in S}d^{\pi_\theta}B(s)\nabla_\theta\sum_{a\in A}\pi_\theta(s,a)\\
&=0
\end{aligned}
$$
A good baseline is the state value function $B(s)=V^{\pi_\theta}(s)$
So we can rewrite the policy gradient using the advantage function $A^{\pi_\theta}(s,a)=Q^{\pi_\theta}(s,a)-V^{\pi_theta}(s)$
$$
\nabla_\theta J(\theta)=\mathbb{E}\left[\nabla_\theta \log \pi_\theta(s,a) A^{\pi_theta}(s,a)\right]
$$
##### Estimating the Advantage function
**Method 1:** direct estimation
> May increase the variance
The advantage function can significantly reduce variance of policy gradient
So the critic should really estimate the advantage function
For example, by estimating both $V^{\pi_theta}(s)$ and $Q^{\pi_theta}(s,a)$
Using two function approximators and two parameter vectors,
$$
V_v(s)\approx V^{\pi_\theta}(s)\\
Q_w(s,a)\approx Q^{\pi_\theta}(s,a)\\
A(s,a)=Q_w(s,a)-V_v(s)
$$
And updating both value functions by e.g. TD learning
**Method 2:** using the TD error
> We can prove that TD error is an unbiased estimation of the advantage function
For the true value function $V^{\pi_\theta}(s)$, the TD error $\delta^{\pi_\theta}$
$$
\delta^{\pi_\theta} = r + \gamma V^{\pi_\theta}(s) - V^{\pi_\theta}(s)
$$
is an unbiased estimate of the advantage function
$$
\begin{aligned}
\mathbb{E}_{\pi_\theta}[\delta^{\pi_\theta}| s,a]&=\mathbb{E}_{\pi_\theta}[r + \gamma V^{\pi_\theta}(s') |s,a]-V^{\pi_\theta}(s)\\
&=Q^{\pi_\theta}(s,a)-V^{\pi_\theta}(s)\\
&=A^{\pi_\theta}(s,a)
\end{aligned}
$$
So we can use the TD error to compute the policy gradient
$$
\Delta \theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) \delta^{\pi_\theta}]
$$
In practice, we can use an approximate TD error $\delta_v=r+\gamma V_v(s')-V_v(s)$ to compute the policy gradient
### Summary of policy gradient algorithms
THe policy gradient has many equivalent forms.
$$
\begin{aligned}
\nabla_\theta J(\theta) &= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) v_t] \text{ REINFORCE} \\
&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q_w(s,a)] \text{ Q Actor-Critic} \\
&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) A^{\pi_\theta}(s,a)] \text{ Advantage Actor-Critic} \\
&= \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) \delta^{\pi_\theta}] \text{ TD Actor-Critic}
\end{aligned}
$$
Each leads s stochastic gradient ascent algorithm.
Critic use policy evaluation to estimate the $Q^\pi(s,a)$ or $A^\pi(s,a)$ or $V^\pi(s)$.
## Compatible Function Approximation
If the following two conditions are satisfied:
1. Value function approximation is a compatible with the policy
$$
\nabla_w Q_w(s,a) = \nabla_\theta \log \pi_\theta(s,a)
$$
2. Value function parameters $w$ minimize the MSE
$$
\epsilon = \mathbb{E}_{\pi_\theta}[(Q^{\pi_\theta}(s,a)-Q_w(s,a))^2]
$$
Note $\epsilon$ need not be zero, just need to be minimized.
Then the policy gradient is exact
$$
\nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q_w(s,a)]
$$
Remember:
$$
\nabla_\theta J(\theta) = \mathbb{E}_{\pi_\theta}[\nabla_\theta \log \pi_\theta(s,a) Q^{\pi_\theta}(s,a)]
$$
### Challenges with Policy Gradient Methods
- Data Inefficiency
- On-policy method: for each new policy, we need to generate a completely new
- trajectory
- The data is thrown out after just one gradient update
- As complex neural networks need many updates, this makes the training process very slow
- Unstable update step size is very important
- If step size is too large:
- Large step -> bad policy
- Next batch is generated from current bad policy -> collect bad samples
- Bad samples -> worse policy (compare to supervised learning: the correct label and data in the following batches may correct it)
- If step size is too small: the learning process is slow

View File

@@ -14,4 +14,5 @@ export default {
CSE510_L9: "CSE510 Deep Reinforcement Learning (Lecture 9)", CSE510_L9: "CSE510 Deep Reinforcement Learning (Lecture 9)",
CSE510_L10: "CSE510 Deep Reinforcement Learning (Lecture 10)", CSE510_L10: "CSE510 Deep Reinforcement Learning (Lecture 10)",
CSE510_L11: "CSE510 Deep Reinforcement Learning (Lecture 11)", CSE510_L11: "CSE510 Deep Reinforcement Learning (Lecture 11)",
CSE510_L12: "CSE510 Deep Reinforcement Learning (Lecture 12)"
} }

View File

@@ -0,0 +1,166 @@
# CSE5313 Coding and information theory for data science (Recitation 10)
## Question 5
Prove the minimum distance of Reed-Muller code $RM(r,m)$ is $2^{m-r}$.
$n=2^m$.
Recall that the definition of RM code is:
$$
\operatorname{RM}(r,m)=\left\{(f(\alpha_1),\ldots,f(\alpha_2^m))|\alpha_i\in \mathbb{F}_2^m,\deg f\leq r\right\}
$$
<details>
<summary>Example of RM code</summary>
Let $r=0$, it is the repetition code.
$\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}$.
Here $r=0$, so $\dim \operatorname{RM}(0,m)=1$.
So the minimum distance of $RM(0,m)$ is $2^{m-0}=n$.
---
Let $r=m$,
then $\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}=2^m$. (binomial theorem)
So the generator matrix is $n\times n$
So the minimum distance of $RM(m,m)$ is $2^{m-m}=1$.
</details>
Then we can do the induction on $r$.
Assume the minimum distance of $RM(r',m')$ is $2^{m'-r'}$ for all $0\leq r'\leq r$, $r'\leq m'<m-1$.
Then we need to show that the minimum distance of $RM(r,m)$ is $2^{m-r}$.
<details>
<summary>Proof</summary>
Recall that the polynomial $p(x_1,x_2,\ldots,x_m)$ can be written as $p(x_1,x_2,\ldots,x_m)=\sum_{S\subseteq [m],|S|\leq r}f_s X_s$, where $f_s\in \mathbb{F}_2$, the monomial $X_s=\prod_{i\in S}x_i$.
Every monomial $f(x_1,x_2,\ldots,x_m)$ can be written as
$$
\begin{aligned}
p(x_1,x_2,\ldots,x_m)&=\sum_{S\subseteq [m],|S|\leq r}f_s X_s\\
&=g(x_1,x_2,\ldots,x_{m-1})+x_m h(x_1,x_2,\ldots,x_{m-1})\\
\end{aligned}
$$
So $g(x_1,x_2,\ldots,x_{m-1})$ has degree at most $r$ and does not contain $x_m$.
And $x_m h(x_1,x_2,\ldots,x_{m-1})$ has degree at most $r-1$ and contains $x_m$.
Note that the codeword of $RM(r,m)$ is the truth table of some monomial evaluated at all $2^m$ $\alpha_i\in \mathbb{F}_2^m$.
And the minimum distance of $RM(r,m)$ is the minimum hamming weight for linear code, which is the number of $\alpha_i$ such that $f(\alpha_i)=1$
Then we can defined the weight of $f$ to be all $\alpha_i$ such that $f(\alpha_i)=1$.
$$
\operatorname{wt}(f)=\{\alpha_i|f(\alpha_i)=1\}
$$
Note that $g(x_1,x_2,\ldots,x_{m-1})$ is a $RM(r,m-1)$ and $h(x_1,x_2,\ldots,x_{m-1})$ is a $RM(r-1,m-1)$.
If $x_m=0$, then $f(\alpha_i)=g(\alpha_i)$.
If $x_m=1$, then $f(\alpha_i)=g(\alpha_i)+h(\alpha_i)$.
So $\operatorname{wt}(f)=\operatorname{wt}(g)\cup\operatorname{wt}(g+h)$.
Note that $\operatorname{wt}(g+h)$ is the number of $\alpha_i$ such that $g(\alpha_i)+h(\alpha_i)=1$, which is `XOR` in binary field.
So $\operatorname{wt}(g+h)=(\operatorname{wt}(g)\setminus\operatorname{wt}(h))\cup (\operatorname{wt}(h)\setminus\operatorname{wt}(g))$.
So
$$
\begin{aligned}
|\operatorname{wt}(f)|&=|\operatorname{wt}(g)|+|\operatorname{wt}(g+h)|\\
&=|\operatorname{wt}(g)|+|\operatorname{wt}(g)\setminus\operatorname{wt}(h)|+|\operatorname{wt}(h)\setminus\operatorname{wt}(g)|\\
&=|\operatorname{wt}(h)|+2|\operatorname{wt}(h)\setminus\operatorname{wt}(g)|\\
\end{aligned}
$$
Note $h$ is in $\operatorname{RM}(r-1,m-1)$, so $|\operatorname{wt}(h)|=2^{m-r}$
</details>
## Theorem for Reed-Muller code
$$
\operatorname{RM}(r,m)^\perp=\operatorname{RM}(m-r-1,m)
$$
Let $\mathcal{C}=[n,k,d]_q$.
The dual code of $\mathcal{C}$ is $\mathcal{C}^\perp=\{x\in \mathbb{F}^n_q|xc^T=0\text{ for all }c\in \mathcal{C}\}$.
<details>
<summary>Example</summary>
$\operatorname{RM}(0,m)^\perp=\operatorname{RM}(m-1,m)$.
and $\operatorname{RM}(0,m)$ is the repetition code.
which is the dual of the parity code $\operatorname{RM}(m-1,m)$.
</details>
### Lemma for sum of binary product
For $A\subseteq [m]=\{1,2,\ldots,m\}$, let $X^A=\prod_{i\in A}x_i$, we can defined the inner product $\langle X^A,X^B\rangle=\sum_{x\in \{0,1\}^m}\prod_{i\in A}x_i\prod_{i\in B}x_i=\sum_{x\in \{0,1\}^m}\prod_{i\in A\cup B}x_i$.
So $\langle X^A,X^B\rangle=\begin{cases}
1 & \text{if }A\cup B=[m]\\
0 & \text{otherwise}
\end{cases}$
because $\prod_{i\in A\cup B}x_i=1$ if every coordinate in $A\cup B$ is 1.
So the number of such $x\in \{0,1\}^m$ is $2^{m-|A\cup B|}$.
This implies that $\langle X^A,X^B\rangle=1$ if and only if $m-|A\cup B|=0$.
Recall that $\operatorname{RM}(r,m)$ is the evaluation of $f=\sum_{B\subseteq [m],|B|\leq r}\beta X^B$ at all $\beta_i\in \{0,1\}^m$.
$\operatorname{RM}(m-r-1,m)$ is the evaluation of $h=\sum_{A\subseteq [m],|A|\leq m-r-1}\alpha X^A$ at all $\alpha_i \in \{0,1\}^m$.
By linearity of inner product, we have
$$
\begin{aligned}
\langle f,h\rangle&=\langle \sum_{B\subseteq [m],|B|\leq r}\beta X^B,\sum_{A\subseteq [m],|A|\leq m-r-1}\alpha X^A\rangle\\
&=\sum_{B\subseteq [m],|B|\leq r}\sum_{A\subseteq [m],|A|\leq m-r-1}\beta\alpha\langle X^B,X^A\rangle\\
\end{aligned}
$$
Because $|A\cup B|\leq |A|+|B|\leq m-r-1+r=m-1$.
So $\langle X^B,X^A\rangle=0$ since $m-1<m$
So $\langle f,h\rangle=0$.
<details>
<summary>Proof for the theorem</summary>
Recall that the dual code of $\operatorname{RM}(r,m)^\perp=\{x\in \mathbb{F}_2^m|xc^T=0\text{ for all }c\in \operatorname{RM}(r,m)\}$.
So $\operatorname{RM}(m-r-1,m)\subseteq \operatorname{RM}(r,m)^\perp$.
So the last step is the dimension check.
Since $\dim \operatorname{RM}(r,m)=\sum_{i=0}^{r}\binom{m}{i}$ and the dimension of the dual code is $2^m-\dim \operatorname{RM}(r,m)=\sum_{i=0}^{m}\binom{m}{i}-\sum_{i=0}^{r}\binom{m}{i}=\sum_{i=r+1}^{m}\binom{m}{i}$.
Since $\binom{m}{i}=\binom{m}{m-i}$, we have $\sum_{i=r+1}^{m}\binom{m}{i}=\sum_{i=r+1}^{m}\binom{m}{m-i}=\sum_{i=0}^{m-r-1}\binom{m}{i}$.
This is exactly the dimension of $\operatorname{RM}(m-r-1,m)$.
</details>

View File

@@ -13,4 +13,5 @@ export default {
CSE5313_L8: "CSE5313 Coding and information theory for data science (Lecture 8)", CSE5313_L8: "CSE5313 Coding and information theory for data science (Lecture 8)",
CSE5313_L9: "CSE5313 Coding and information theory for data science (Lecture 9)", CSE5313_L9: "CSE5313 Coding and information theory for data science (Lecture 9)",
CSE5313_L10: "CSE5313 Coding and information theory for data science (Recitation 10)", CSE5313_L10: "CSE5313 Coding and information theory for data science (Recitation 10)",
CSE5313_L11: "CSE5313 Coding and information theory for data science (Recitation 11)",
} }

View File

@@ -1,4 +1,4 @@
# Exam 3 Review session # Math 4111 Exam 3 review
## Relations between series and topology (compactness, closure, etc.) ## Relations between series and topology (compactness, closure, etc.)

View File

@@ -12,10 +12,10 @@ Topics include:
The course is taught by [Alan Chang](https://math.wustl.edu/pQEDle/alan-chang). The course is taught by [Alan Chang](https://math.wustl.edu/pQEDle/alan-chang).
It is easy in my semester perhaps, it is the first course I got 3 perfect scores in exams. (Unfortunately, I did not get the extra credit for the third midterm exam.)
<!-- <!--
It is easy in my semester perhaps, it is the first course I got 3 perfect scores in exams. (Unfortunately, I did not get the extra credit for the third midterm exam.)
## Midterms stats ## Midterms stats
Our semester is way more easier than the previous ones. The previous ones got median scores of 25. Our semester is way more easier than the previous ones. The previous ones got median scores of 25.

View File

@@ -0,0 +1,87 @@
# Math4201 Topology I (Lecture 17)
## Quotient topology
How can we define topologies on the space obtained points in a topological space?
### Quotient map
Let $(X,\mathcal{T})$ be a topological space. $X^*$ is a set and $q:X\to X^*$ is a surjective map.
The quotient topology on $X^*$ is defined as follows:
$$
\mathcal{T}^* = \{U\subseteq X^*\mid q^{-1}(U)\in \mathcal{T}\}
$$
$U\subseteq X^*$ is open if and only if $q^{-1}(U)$ is open in $X$.
In particular, $q$ is continuous map.
#### Definition of quotient map
$q:X\to X^*$ defined above is called a **quotient map**.
#### Definition of quotient space
$(X^*,\mathcal{T}^*)$ is called the **quotient space** of $X$ by $q$.
### Typical way of constructing a surjective map
#### Equivalence relation
$\sim$ is a subset of $X\times X$ satisfying:
- reflexive: $\forall x\in X, x\sim x$
- symmetric: $\forall x,y\in X, x\sim y\implies y\sim x$
- transitive: $\forall x,y,z\in X, x\sim y\text{ and } y\sim z\implies x\sim z$
#### Equivalence classes
Check equivalence relation.
For $x\in X$, the equivalence class of $x$ is denoted as $[x]\coloneqq \{y\in X\mid y\sim x\}$.
$X^*$ is the set of all equivalence classes on $X$.
$q:X\to X^*$ is defined as $q(x)=[x]$ will be a surjective map.
<details>
<summary>Example of surjective maps and their quotient spaces</summary>
Let $X=\mathbb{R}^2$ and $(s,t)\sim (s',t')$ if and only if $s-s'$ and $t-t'$ are both integers.
This space as a topological space is homeomorphic to the torus.
---
Let $X=\{(s,t)\in \mathbb{R}^2\mid s^2+t^2\leq 1\}$ and $(s,t)\sim (s',t')$ if and only if $s^2+t^2$ and $s'^2+t'^2$. with subspace topology as a subspace of $\mathbb{R}^2$.
This space as a topological space is homeomorphic to the spherical shell $S^2$.
</details>
We will show that the quotient topology is a topology on $X^*$.
<details>
<summary>Proof</summary>
We need to show that the quotient topology is a topology on $X^*$.
1. $\emptyset, X^*$ are open in $X^*$.
$\emptyset, X^*$ are open in $X^*$ because $q^{-1}(\emptyset)=q^{-1}(X^*)=\emptyset$ and $q^{-1}(X^*)=X$ are open in $X$.
2. $\mathcal{T}^*$ is closed with respect to arbitrary unions.
$$
q^{-1}(\bigcup_{\alpha \in I} U_\alpha)=\bigcup_{\alpha \in I} q^{-1}(U_\alpha)
$$
3. $\mathcal{T}^*$ is closed with respect to finite intersections.
$$
q^{-1}(\bigcap_{\alpha \in I} U_\alpha)=\bigcap_{\alpha \in I} q^{-1}(U_\alpha)
$$
</details>

View File

@@ -0,0 +1,115 @@
# Math 4201 Topology (Lecture 18)
## Quotient topology
Let $(X,\mathcal{T})$ be a topological space and $X^*$ be a set, $q:X\to X^*$ is a surjective map. The quotient topology on $X^*$:
$U\subseteq X^*$ is open $\iff q^{-1}(U)$ is open in $X$.
Equivalently,
$Z\subseteq X^*$ is closed $\iff q^{-1}(Z)$ is closed in $X$.
### Open maps
Let $(X,\mathcal{T})$ and $(Y,\mathcal{T}')$ be two topological spaces
Let $f:X\to Y$ is a quotient map if and only if $f$ is surjective and
$U\subseteq Y$ is open $\iff f^{-1}(U)$ is open
or equivalently
$Z\subseteq Y$ is closed $\iff f^{-1}(Z)$ is closed.
#### Definition of open map
Let $X\to Y$ be **continuous**. We say $f$ is open if for any $V\subseteq X$ be open, $f(V)$ is open in $Y$.
Let $X\to Y$ be **continuous**. We say $f$ is closed if for any $V\subseteq X$ be closed, $f(V)$ is closed in $Y$.
$$
ff^{-1}(U)=U\text{ if }f \text{ is surjective}=U\cap f(X)
$$
<details>
<summary>Examples of open maps</summary>
Let $X,Y$ be topological spaces. Define the projection map $\pi_X:X\times Y\to X$, $\pi_X(x,y)=x$.
This is a surjective continuous map $(Y\neq \phi)$
This map is open. If $U\subseteq X$ is open and $V\subseteq Y$ is open, then $U\times V$ is open in $X\times Y$ and such open sets form a basis.
$\pi_X(U\times V)=\begin{cases}
U&\text{ if }V\neq \emptyset\\
\emptyset &\text{ if }V=\emptyset
\end{cases}$
In particular, image of any such open set is open. Since any open $W\subseteq X\times Y$ is a union of such open sets.
$W=\bigcup_{\alpha\in I}U_\alpha\times V\alpha$
$\pi_X(W)=\pi_X(\bigcup_{\alpha\in I}U_\alpha\times V_\alpha)=\bigcup_{\alpha\in I}\pi_X(U_\alpha\times V_\alpha)=\bigcup_{\alpha\in I}U_\alpha$
is open in $X$.
However, $\pi_X$ is not necessarily a closed map.
Let $X=Y=\mathbb{R}$ and $X\times Y=\mathbb{R}^2$
$Z\subseteq \mathbb{R}^2=\{(x,y)\in\mathbb{R}^2|x\neq 0, y=\frac{1}{x}\}$ is a closed set in $\mathbb{R}^2$
$\pi_X(Z)=\mathbb{R}\setminus \{0\}$ is not closed.
---
Let $X=[0,1]\cup [2,3]$, $Y=[0,2]$ with subspace topology on $\mathbb{R}$
Let $f:X\to Y$ be defined as:
$$
f(x)=\begin{cases}
x& \text{ if } x\in [0,1]\\
x-1& \text{ if }x\in [2,3]
\end{cases}
$$
$f$ is continuous and surjective, $f$ is closed $Z\subseteq [0,1]\cup [2,3]=Z_1\cup Z_2$, $Z_1\subseteq [0,1],Z_2\subseteq [2,3]$ is closed, $f(Z)=f(Z_1)\cup f(Z_2)$ is closed in $X$.
But $f$ is not open. Take $U=[0,1]\subseteq X$, $f=[0,1]\subseteq [0,2]$ is not open because of the point $1$.
> In general, and closed surjective map is a quotient map. In particular, this is an example of a closed surjective quotient map which is not open.
</details>
Let $f$ be a surjective open map. Then $f$ is a quotient map:
$U\subseteq Y$ is open and $f$ is continuous, $\implies f^{-1}(U)\subseteq X$ is open
$f^{-1}(U)\subseteq X$ is open and $f$ is surjective and open, $\implies f(f^{-1}(U))=U$ is open.
#### Proposition of continuous and open maps
If $f$ is a continuous bijection, then $f$ is open. if and only if $f^{-1}$ is continuous.
<details>
<summary>Proof</summary>
To show $f^{-1}$ is continuous, we have to show for $U\subseteq X$ open. $(f^{-1})^{-1}(U)=f(U)\subseteq Y$ is open.
This is the same thing as saying that $f$ is open.
</details>
Let $f$ be a quotient map $f: X \to Y$, and $g$ be a continuous map $g:X\to Z$.
We want to find $\hat{g}$ such that $g=\hat{g}\circ f$.
If $x_1,x_2\in X$, such that $f(x_1)=f(x_2)$ and $g(x_1)\neq g(x_2)$, then we cannot find $\hat{g}$.
#### Proposition
Let $f$ and $g$ be as above. Moreover, for any $y\in Y$, all the points in $f^{-1}(y)$ are mapped to a single point by $g$. Then there is a unique continuous map $\hat{g}$ such that $g=\hat{g}\circ f$.
Continue next week.

View File

@@ -19,4 +19,5 @@ export default {
Math4201_L14: "Topology I (Lecture 14)", Math4201_L14: "Topology I (Lecture 14)",
Math4201_L15: "Topology I (Lecture 15)", Math4201_L15: "Topology I (Lecture 15)",
Math4201_L16: "Topology I (Lecture 16)", Math4201_L16: "Topology I (Lecture 16)",
Math4201_L17: "Topology I (Lecture 17)",
} }