update notations

This commit is contained in:
Trance-0
2025-11-04 12:43:23 -06:00
parent d24c0bdd9e
commit 614479e4d0
27 changed files with 333 additions and 100 deletions

View File

@@ -198,20 +198,20 @@ $$
Take the softmax policy as example:
Weight actions using the linear combination of features $\phi(s,a)^T\theta$:
Weight actions using the linear combination of features $\phi(s,a)^\top\theta$:
Probability of action is proportional to the exponentiated weights:
$$
\pi_\theta(s,a) \propto \exp(\phi(s,a)^T\theta)
\pi_\theta(s,a) \propto \exp(\phi(s,a)^\top\theta)
$$
The score function is
$$
\begin{aligned}
\nabla_\theta \ln\left[\frac{\exp(\phi(s,a)^T\theta)}{\sum_{a'\in A}\exp(\phi(s,a')^T\theta)}\right] &= \nabla_\theta(\ln \exp(\phi(s,a)^T\theta) - (\ln \sum_{a'\in A}\exp(\phi(s,a')^T\theta))) \\
&= \nabla_\theta\left(\phi(s,a)^T\theta -\frac{\phi(s,a)\sum_{a'\in A}\exp(\phi(s,a')^T\theta)}{\sum_{a'\in A}\exp(\phi(s,a')^T\theta)}\right) \\
\nabla_\theta \ln\left[\frac{\exp(\phi(s,a)^\top\theta)}{\sum_{a'\in A}\exp(\phi(s,a')^\top\theta)}\right] &= \nabla_\theta(\ln \exp(\phi(s,a)^\top\theta) - (\ln \sum_{a'\in A}\exp(\phi(s,a')^\top\theta))) \\
&= \nabla_\theta\left(\phi(s,a)^\top\theta -\frac{\phi(s,a)\sum_{a'\in A}\exp(\phi(s,a')^\top\theta)}{\sum_{a'\in A}\exp(\phi(s,a')^\top\theta)}\right) \\
&=\phi(s,a) - \sum_{a'\in A} \prod_\theta(s,a') \phi(s,a')
&= \phi(s,a) - \mathbb{E}_{a'\sim \pi_\theta(s,a')}[\phi(s,a')]
\end{aligned}
@@ -221,7 +221,7 @@ $$
In continuous action spaces, a Gaussian policy is natural
Mean is a linear combination of state features $\mu(s) = \phi(s)^T\theta$
Mean is a linear combination of state features $\mu(s) = \phi(s)^\top\theta$
Variance may be fixed $\sigma^2$, or can also parametrized

View File

@@ -53,7 +53,7 @@ $$
Action-Value Actor-Critic
- Simple actor-critic algorithm based on action-value critic
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^T w$
- Using linear value function approximation $Q_w(s,a)=\phi(s,a)^\top w$
Critic: updates $w$ by linear $TD(0)$
Actor: updates $\theta$ by policy gradient

View File

@@ -193,7 +193,7 @@ $$
Make linear approximation to $L_{\pi_{\theta_{old}}}$ and quadratic approximation to KL term.
Maximize $g\cdot(\theta-\theta_{old})-\frac{\beta}{2}(\theta-\theta_{old})^T F(\theta-\theta_{old})$
Maximize $g\cdot(\theta-\theta_{old})-\frac{\beta}{2}(\theta-\theta_{old})^\top F(\theta-\theta_{old})$
where $g=\frac{\partial}{\partial \theta}L_{\pi_{\theta_{old}}}(\pi_{\theta})\vert_{\theta=\theta_{old}}$ and $F=\frac{\partial^2}{\partial \theta^2}\overline{KL}_{\pi_{\theta_{old}}}(\pi_{\theta})\vert_{\theta=\theta_{old}}$
@@ -201,7 +201,7 @@ where $g=\frac{\partial}{\partial \theta}L_{\pi_{\theta_{old}}}(\pi_{\theta})\ve
<summary>Taylor Expansion of KL Term</summary>
$$
D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\approx D_{KL}(\pi_{\theta_{old}}|\pi_{\theta_{old}})+d^T \nabla_\theta D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\vert_{\theta=\theta_{old}}+\frac{1}{2}d^T \nabla_\theta^2 D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\vert_{\theta=\theta_{old}}d
D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\approx D_{KL}(\pi_{\theta_{old}}|\pi_{\theta_{old}})+d^\top \nabla_\theta D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\vert_{\theta=\theta_{old}}+\frac{1}{2}d^\top \nabla_\theta^2 D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\vert_{\theta=\theta_{old}}d
$$
$$
@@ -220,9 +220,9 @@ $$
\begin{aligned}
\nabla_\theta^2 D_{KL}(\pi_{\theta_{old}}|\pi_{\theta})\vert_{\theta=\theta_{old}}&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\nabla_\theta^2 \log P_\theta(x)\vert_{\theta=\theta_{old}}\\
&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\nabla_\theta \left(\frac{\nabla_\theta P_\theta(x)}{P_\theta(x)}\right)\vert_{\theta=\theta_{old}}\\
&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\frac{\nabla_\theta^2 P_\theta(x)-\nabla_\theta P_\theta(x)\nabla_\theta P_\theta(x)^T}{P_\theta(x)^2}\right)\vert_{\theta=\theta_{old}}\\
&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\frac{\nabla_\theta^2 P_\theta(x)\vert_{\theta=\theta_{old}}}P_{\theta_{old}}(x)\right)+\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\nabla_\theta \log P_\theta(x)\nabla_\theta \log P_\theta(x)^T\right)\vert_{\theta=\theta_{old}}\\
&=\mathbb{E}_{x\sim \pi_{\theta_{old}}}\nabla_\theta\log P_\theta(x)\nabla_\theta\log P_\theta(x)^T\vert_{\theta=\theta_{old}}\\
&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\frac{\nabla_\theta^2 P_\theta(x)-\nabla_\theta P_\theta(x)\nabla_\theta P_\theta(x)^\top}{P_\theta(x)^2}\right)\vert_{\theta=\theta_{old}}\\
&=-\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\frac{\nabla_\theta^2 P_\theta(x)\vert_{\theta=\theta_{old}}}P_{\theta_{old}}(x)\right)+\mathbb{E}_{x\sim \pi_{\theta_{old}}}\left(\nabla_\theta \log P_\theta(x)\nabla_\theta \log P_\theta(x)^\top\right)\vert_{\theta=\theta_{old}}\\
&=\mathbb{E}_{x\sim \pi_{\theta_{old}}}\nabla_\theta\log P_\theta(x)\nabla_\theta\log P_\theta(x)^\top\vert_{\theta=\theta_{old}}\\
\end{aligned}
$$

View File

@@ -27,7 +27,7 @@ $\theta_{new}=\theta_{old}+d$
First order Taylor expansion for the loss and second order for the KL:
$$
\approx \arg\max_{d} J(\theta_{old})+\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d-\frac{1}{2}\lambda(d^T\nabla_\theta^2 D_{KL}\left[\pi_{\theta_{old}}||\pi_{\theta}\right]\mid_{\theta=\theta_{old}}d)+\lambda \delta
\approx \arg\max_{d} J(\theta_{old})+\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d-\frac{1}{2}\lambda(d^\top\nabla_\theta^2 D_{KL}\left[\pi_{\theta_{old}}||\pi_{\theta}\right]\mid_{\theta=\theta_{old}}d)+\lambda \delta
$$
If you are really interested, try to fill the solving the KL Constrained Problem section.
@@ -38,7 +38,7 @@ Setting the gradient to zero:
$$
\begin{aligned}
0&=\frac{\partial}{\partial d}\left(-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d+\frac{1}{2}\lambda(d^T F(\theta_{old})d\right)\\
0&=\frac{\partial}{\partial d}\left(-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}d+\frac{1}{2}\lambda(d^\top F(\theta_{old})d\right)\\
&=-\nabla_\theta J(\theta)\mid_{\theta=\theta_{old}}+\frac{1}{2}\lambda F(\theta_{old})d
\end{aligned}
$$
@@ -58,15 +58,15 @@ $$
$$
$$
D_{KL}(\pi_{\theta_{old}}||\pi_{\theta})\approx \frac{1}{2}(\theta-\theta_{old})^T F(\theta_{old})(\theta-\theta_{old})
D_{KL}(\pi_{\theta_{old}}||\pi_{\theta})\approx \frac{1}{2}(\theta-\theta_{old})^\top F(\theta_{old})(\theta-\theta_{old})
$$
$$
\frac{1}{2}(\alpha g_N)^T F(\alpha g_N)=\delta
\frac{1}{2}(\alpha g_N)^\top F(\alpha g_N)=\delta
$$
$$
\alpha=\sqrt{\frac{2\delta}{g_N^T F g_N}}
\alpha=\sqrt{\frac{2\delta}{g_N^\top F g_N}}
$$
However, due to the quadratic approximation, the KL constrains may be violated.

View File

@@ -16,7 +16,7 @@ So we can learn $f(s_t,a_t)$ from data, and _then_ plan through it.
Model-based reinforcement learning version **0.5**:
1. Run base polity $\pi_0$ (e.g. random policy) to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^T$
1. Run base polity $\pi_0$ (e.g. random policy) to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^\top$
2. Learn dynamics model $f(s_t,a_t)$ to minimize $\sum_{i}\|f(s_i,a_i)-s_{i+1}\|^2$
3. Plan through $f(s_t,a_t)$ to choose action $a_t$
@@ -52,10 +52,10 @@ Version 2.0: backpropagate directly into policy
Final version:
1. Run base polity $\pi_0$ (e.g. random policy) to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^T$
1. Run base polity $\pi_0$ (e.g. random policy) to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^\top$
2. Learn dynamics model $f(s_t,a_t)$ to minimize $\sum_{i}\|f(s_i,a_i)-s_{i+1}\|^2$
3. Backpropagate through $f(s_t,a_t)$ into the policy to optimized $\pi_\theta(s_t,a_t)$
4. Run the policy $\pi_\theta(s_t,a_t)$ to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^T$
4. Run the policy $\pi_\theta(s_t,a_t)$ to collect $\mathcal{D} = \{(s_t, a_t, s_{t+1})\}_{t=0}^\top$
5. Goto 2
## Model Learning with High-Dimensional Observations

View File

@@ -40,20 +40,20 @@ Let $G$ and $H$ be the generator and parity-check matrices of (any) linear code
#### Lemma 1
$$
H G^T = 0
H G^\top = 0
$$
<details>
<summary>Proof</summary>
By definition of generator matrix and parity-check matrix, $forall e_i\in H$, $e_iG^T=0$.
By definition of generator matrix and parity-check matrix, $forall e_i\in H$, $e_iG^\top=0$.
So $H G^T = 0$.
So $H G^\top = 0$.
</details>
#### Lemma 2
Any matrix $M\in \mathbb{F}_q^{(n-k)\times n}$ such that $\operatorname{rank}(M) = n - k$ and $M G^T = 0$ is a parity-check matrix for $C$ (i.e. $C = \ker M$).
Any matrix $M\in \mathbb{F}_q^{(n-k)\times n}$ such that $\operatorname{rank}(M) = n - k$ and $M G^\top = 0$ is a parity-check matrix for $C$ (i.e. $C = \ker M$).
<details>
<summary>Proof</summary>
@@ -62,7 +62,7 @@ It is sufficient to show that the two statements
1. $\forall c\in C, c=uG, u\in \mathbb{F}^k$
$M c^T = M(uG)^T = M(G^T u^T) = 0$ since $M G^T = 0$.
$M c^\top = M(uG)^\top = M(G^\top u^\top) = 0$ since $M G^\top = 0$.
Thus $C \subseteq \ker M$.
@@ -84,15 +84,15 @@ We proceed by applying the lemma 2.
1. $\operatorname{rank}(H) = n - k$ since $H$ is a Vandermonde matrix times a diagonal matrix with no zero entries, so $H$ is invertible.
2. $H G^T = 0$.
2. $H G^\top = 0$.
note that $\forall$ row $i$ of $H$, $0\leq i\leq n-k-1$, $\forall$ column $j$ of $G^T$, $0\leq j\leq k-1$
note that $\forall$ row $i$ of $H$, $0\leq i\leq n-k-1$, $\forall$ column $j$ of $G^\top$, $0\leq j\leq k-1$
So
$$
\begin{aligned}
H G^T &= \begin{bmatrix}
H G^\top &= \begin{bmatrix}
1 & 1 & \cdots & 1\\
\alpha_1 & \alpha_2 & \cdots & \alpha_n\\
\alpha_1^2 & \alpha_2^2 & \cdots & \alpha_n^2\\

View File

@@ -101,7 +101,7 @@ $$
Let $\mathcal{C}=[n,k,d]_q$.
The dual code of $\mathcal{C}$ is $\mathcal{C}^\perp=\{x\in \mathbb{F}^n_q|xc^T=0\text{ for all }c\in \mathcal{C}\}$.
The dual code of $\mathcal{C}$ is $\mathcal{C}^\perp=\{x\in \mathbb{F}^n_q|xc^\top=0\text{ for all }c\in \mathcal{C}\}$.
<details>
<summary>Example</summary>
@@ -151,7 +151,7 @@ So $\langle f,h\rangle=0$.
<details>
<summary>Proof for the theorem</summary>
Recall that the dual code of $\operatorname{RM}(r,m)^\perp=\{x\in \mathbb{F}_2^m|xc^T=0\text{ for all }c\in \operatorname{RM}(r,m)\}$.
Recall that the dual code of $\operatorname{RM}(r,m)^\perp=\{x\in \mathbb{F}_2^m|xc^\top=0\text{ for all }c\in \operatorname{RM}(r,m)\}$.
So $\operatorname{RM}(m-r-1,m)\subseteq \operatorname{RM}(r,m)^\perp$.

View File

@@ -230,7 +230,7 @@ Step 1: Arrange the $B=\binom{k+1}{2}+k(d-k)$ symbols in a matrix $M$ follows:
$$
M=\begin{pmatrix}
S & T\\
T^T & 0
T^\top & 0
\end{pmatrix}\in \mathbb{F}_q^{d\times d}
$$
@@ -267,15 +267,15 @@ Repair from (any) nodes $H = \{h_1, \ldots, h_d\}$.
Newcomer contacts each $h_j$: “My name is $i$, and Im lost.”
Node $h_j$ sends $c_{h_j}M c_i^T$ (inner product).
Node $h_j$ sends $c_{h_j}M c_i^\top$ (inner product).
Newcomer assembles $C_H Mc_i^T$.
Newcomer assembles $C_H Mc_i^\top$.
$CH$ invertible by construction!
- Recover $Mc_i^T$.
- Recover $Mc_i^\top$.
- Recover $c_i^TM$ ($M$ is symmetric)
- Recover $c_i^\topM$ ($M$ is symmetric)
#### Reconstruction on Product-Matrix MBR codes
@@ -292,9 +292,9 @@ DC assembles $C_D M$.
$\Psi_D$ invertible by construction.
- DC computes $\Psi_D^{-1}C_DM = (S+\Psi_D^{-1}\Delta_D^T, T)$
- DC computes $\Psi_D^{-1}C_DM = (S+\Psi_D^{-1}\Delta_D^\top, T)$
- DC obtains $T$.
- Subtracts $\Psi_D^{-1}\Delta_D T^T$ from $S+\Psi_D^{-1}\Delta_D T^T$ to obtain $S$.
- Subtracts $\Psi_D^{-1}\Delta_D T^\top$ from $S+\Psi_D^{-1}\Delta_D T^\top$ to obtain $S$.
<details>
<summary>Fill an example here please.</summary>

View File

@@ -0,0 +1,232 @@
# CSE5313 Coding and information theory for data science (Lecture 19)
## Private information retrieval
### Problem setup
Premise:
- Database $X = \{x_1, \ldots, x_m\}$, each $x_i \in \mathbb{F}_q^k$ is a "file" (e.g., medical record).
- $X$ is coded $X \mapsto \{y_1, \ldots, y_n\}$, $y_j$ stored at server $j$.
- The user (physician) wants $x_i$.
- The user sends a query $q_j \sim Q_j$ to server $j$.
- Server $j$ responds with $a_j \sim A_j$.
Decodability:
- The user can retrieve the file: $H(X_i | A_1, \ldots, A_n) = 0$.
Privacy:
- $i$ is seen as $i \sim U = U_{m}$, reflecting server's lack of knowledge.
- $i$ must be kept private: $I(Q_j; U) = 0$ for all $j \in n$.
> In short, we want to retrieve $x_i$ from the servers without revealing $i$ to the servers.
### Private information retrieval from Replicated Databases
#### Simple case, one server
Say $n = 1, y_1 = X$.
- All data is stored in one server.
- Simple solution:
- $q_1 =$ "send everything".
- $a_1 = y_1 = X$.
Theorem: Information Theoretic PIR with $n = 1$ can only be achieved by downloading the entire database.
- Can we do better if $n > 1$?
#### Collusion parameter
Key question for $n > 1$: Can servers collude?
- I.e., does server $j$ see any $Q_\ell$, $\ell \neq j$?
- Key assumption:
- Privacy parameter $z$.
- At most $z$ servers can collude.
- $z = 1\implies$ No collusion.
- Requirement for $z = 1$: $I(Q_j; U) = 0$ for all $j \in n$.
- Requirement for a general $z$:
- $I(Q_\mathcal{T}; U) = 0$ for all $\mathcal{T} \in n$, $|\mathcal{T}| \leq z$, where $Q_\mathcal{T} = Q_\ell$ for all $\ell \in \mathcal{T}$.
- Motivation:
- Interception of communication links.
- Data breaches.
Other assumptions:
- Computational Private information retrieval (even all the servers are hacked, still cannot get the information -> solve np-hard problem):
- Non-zero MI
#### Private information retrieval from 2-replicated databases
First PIR protocol: Chor et al. FOCS 95.
- The data $X = \{x_1, \ldots, x_m\}$ is replicated on two servers.
- $z = 1$, i.e., no collusion.
- Protocol: User has $i \sim U_{m}$.
- User generates $r \sim U_{\mathbb{F}_q^m}$.
- $q_1 = r, q_2 = r + e_i$ ($e_i \in \mathbb{F}_q^m$ is the $i$-th unit vector, $q_2$ is equivalent to one-time pad encryption of $x_i$ with key $r$).
- $a_j = q_j X^\top = \sum_{\ell \in m} q_j, \ell x_\ell$
- Linear combination of the files according to the query vector $q_j$.
- Decoding?
- $a_2 - a_1 = q_2 - q_1 X^\top = e_i X^\top = x_i$.
- Download?
- $a_j =$ size of file $\implies$ downloading **twice** the size of the file.
- Privacy?
- Since $z = 1$, need to show $I(U; Q_i) = 0$.
- $I(U; Q_1) = I(e_U; F) = 0$ since $U$ and $F$ are independent.
- $I(U; Q_2) = I(e_U; F + e_U) = 0$ since this is one-time pad!
##### Parameters and notations in PIR
Parameters of the system:
- $n =$ # servers (as in storage).
- $m =$ # files.
- $k =$ size of each file (as in storage).
- $z =$ max. collusion (as in secret sharing).
- $t =$ # of answers required to obtain $x_i$ (as in secret sharing).
- $n - t$ servers are “stragglers”, i.e., might not respond.
Figures of merit:
- PIR-rate = $\#$ desired symbols / $\#$ downloaded symbols
- PIR-capacity = largest possible rate.
Notaional conventions:
-The dataset $X = \{x_j\}_{j \in m} = \{x_{j, \ell}\}_{(j, \ell) \in [m] \times [k]}$ is seen as a vector in $\mathbb{F}_q^{mk}$.
- Index $\mathbb{F}_q^{mk}$ using $[m] \times [k]$, i.e., $x_{j, \ell}$ is the $\ell$-th symbol of the $j$-th file.
#### Private information retrieval from 4-replicated databases
Consider $n = 4$ replicated servers, file size $k = 2$, collusion $z = 1$.
Protocol: User has $i \sim U_{m}$.
- Fix distinct nonzero $\alpha_1, \ldots, \alpha_4 \in \mathbb{F}_q$.
- Choose $r \sim U_{\mathbb{F}_q^{2m}}$.
- User sends $q_j = e_{i, 1} + \alpha_j e_{i, 2} + \alpha_j^2 r$ to each server $j$.
- Server $j$ responds with
$$
a_j = q_j X^\top = e_{i, 1} X^\top + \alpha_j e_{i, 2} X^\top + \alpha_j^2 r X^\top
$$
- This is an evaluation at $\alpha_j$ of the polynomial $f_i(w) = x_{i, 1} + x_{i, 2} \cdot w + r \cdot w^2$.
- Where $r$ is some random combination of the entries of $X$.
- Decoding?
- Any 3 responses suffice to interpolate $f_i$ and obtain $x_i = x_{i, 1}, x_{i, 2}$.
- $\implies t = 3$, (one straggler is allowed)
- Privacy?
- Does $q_j = e_{i, 1} + \alpha_j e_{i, 2} + \alpha_j^2 r$ look familiar?
- This is a share in [ramp scheme](CSE5313_L18.md#scheme-2-ramp-secret-sharing-scheme-mceliece-sarwate-scheme) with vector messages $m_1 = e_{i, 1}, m_2 = e_{i, 2}, m_i \in \mathbb{F}_q^{2m}$.
- This is equivalent to $2m$ "parallel" ramp scheme over $\mathbb{F}_q$.
- Each one reveals nothing to any $z = 1$ shareholders $\implies$ Private!
### Private information retrieval from general replicated databases
$n$ servers, $m$ files, file size $k$, $X \in \mathbb{F}_q^{mk}$.
Server decodes $x_i$ from any $t$ responses.
Any $\leq z$ servers might collude to infer $i$ ($z < t$).
Protocol: User has $i \sim U_{m}$.
- User chooses $r_1, \ldots, r_z \sim U_{\mathbb{F}_q^{mk}}$.
- User sends $q_j = \sum_{\ell=1}^k e_{i, \ell} \alpha_j^{\ell-1} + \sum_{\ell=1}^z r_\ell \alpha_j^{k+\ell-1}$ to each server $j$.
- Server $j$ responds with $a_j = q_j X^\top = f_i(\alpha_j)$.
- $f_i(w) = \sum_{\ell=1}^k e_{i, \ell} X^\top w^{\ell-1} + \sum_{\ell=1}^z r_\ell X^\top w^{k+\ell-1}$ (random combinations of $X$).
- Caveat: must have $t = k + z$.
- $\implies \deg f_i = k + z - 1 = t - 1$.
- Decoding?
- Interpolation from any $t$ evaluations of $f_i$.
- Privacy?
- Against any $z = t - k$ colluding servers, immediate from the proof of the ramp scheme.
PIR-rate?
- Each $a_j$ is a single field element.
- Download $t = k + z$ elements in $\mathbb{F}_q$ in order to obtain $x_i \in \mathbb{F}_q^k$.
- $\implies$ PIR-rate = $\frac{k}{k+z} = \frac{k}{t}$.
#### Theorem: PIR-capacity for general replicated databases
The PIR-capacity for $n$ replicated databases with $z$ colluding servers, $n - t$ unresponsive servers, and $m$ files is $C = \frac{1-\frac{z}{t}}{1-(\frac{z}{t})^m}$.
- When $m \to \infty$, $C \to 1 - \frac{z}{t} = \frac{t-z}{t} = \frac{k}{t}$.
- The above scheme achieves PIR-capacity as $m \to \infty$
### Private information retrieval from coded databases
#### Problem setup:
Example:
- $n = 3$ servers, $m$ files $x_j$, $x_j = x_{j, 1}, x_{j, 2}$, $k = 2$, and $q = 2$.
- Code each file with a parity code: $x_{j, 1}, x_{j, 2} \mapsto x_{j, 1}, x_{j, 2}, x_{j, 1} + x_{j, 2}$.
- Server $j \in 3$ stores all $j$-th symbols of all coded files.
Queries, answers, decoding, and privacy must be tailored for the code at hand.
With respect to a code $C$ and parameters $n, k, t, z$, such scheme is called coded-PIR.
- The content for server $j$ is denoted by $c_j = c_{j, 1}, \ldots, c_{j, m}$.
- $C$ is usually an MDS code.
#### Private information retrieval from parity-check codes
Example:
Say $z = 1$ (no collusion).
- Protocol: User has $i \sim U_{m}$.
- User chooses $r_1, r_2 \sim U_{\mathbb{F}_2^m}$.
- Two queries to each server:
- $q_{1, 1} = r_1 + e_i$, $q_{1, 2} = r_2$.
- $q_{2, 1} = r_1$, $q_{2, 2} = r_2 + e_i$.
- $q_{3, 1} = r_1$, $q_{3, 2} = r_2$.
- Server $j$ responds with $q_{j, 1} c_j^\top$ and $q_{j, 2} c_j^\top$.
- Decoding?
- $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top + q_{3, 1} c_3^\top = r_1 c_1 + c_2 + c_3 + e_i c_1^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$.
- $q_{1, 1} c_1^\top + q_{2, 1} c_2^\top + q_{3, 1} c_3^\top = r_1 \cdot 0^\top + x_{i, 1} = x_{i, 1}$.
- $q_{1, 2} c_1^\top + q_{2, 2} c_2^\top + q_{3, 2} c_3^\top = r_2 c_1 + c_2 + c_3^\top + e_i c_2^\top = x_{i, 2}$.
- Privacy?
- Every server sees two uniformly random vectors in $\mathbb{F}_2^m$.
<details>
<summary>Proof from coding-theoretic interpretation</summary>
Let $G = g_1^\top, g_2^\top, g_3^\top$ be the generator matrix.
- For every file $x_j = x_{j, 1}, x_{j, 2}$ we encode $x_j G = (x_{j, 1} g_1^\top, x_{j, 2} g_2^\top, x_{j, 1} g_3^\top) = (c_{j, 1}, c_{j, 2}, c_{j, 3})$.
- Server $j$ stores $X g_j^\top = (x_1^\top, \ldots, x_m^\top)^\top g_j^\top = (c_{j, 1}, \ldots, c_{j, m})^\top$.
- By multiplying by $r_1$, the servers together store a codeword in $C$:
- $r_1 X g_1^\top, r_1 X g_2^\top, r_1 X g_3^\top = r_1 X G$.
- By replacing one of the $r_1$s by $r_1 + e_i$, we introduce an error in that entry:
- $\left((r_1 + e_i) X g_1^\top, r_1 X g_2^\top, r_1 X g_3^\top\right) = r_1 X G + (e_i X g_1^\top, 0,0)$.
- Downloading this “erroneous” word from the servers and multiply by $H = h_1^\top, h_2^\top, h_3^\top$ be the parity-check matrix.
$$
\begin{aligned}
\left((r_1 + e_i) X g_1^\top, r_1 X g_2^\top, r_1 X g_3^\top\right) H^\top &= \left(r_1 X G + (e_i X g_1^\top, 0,0)\right) H^\top \\
&= r_1 X G H^\top + (e_i X g_1^\top, 0,0) H^\top \\
&= 0 + x_{i, 1} g_1^\top \\
&= x_{i, 1}.
\end{aligned}
$$
> In homework we will show tha this work with any MDS code ($z=1$).
- Say we obtained $x_{i, 1} g_1^\top, \ldots, x_{i, k} g_k^\top$ (𝑑 1 at a time, how?).
- $x_{i, 1} g_1^\top, \ldots, x_{i, k} g_k^\top = x_{i, B}$, where $B$ is a $k \times k$ submatrix of $G$.
- $B$ is a $k \times k$ submatrix of $G$ $\implies$ invertible! $\implies$ Obtain $x_{i}$.
</details>
> [!TIP]
>
> error + known location $\implies$ erasure. $d = 2 \implies$ 1 erasure is correctable.

View File

@@ -92,10 +92,10 @@ Two equivalent ways to constructing a linear code:
- A **parity check** matrix $H\in \mathbb{F}^{(n-k)\times n}$ with $(n-k)$ rows and $n$ columns.
$$
\mathcal{C}=\{c\in \mathbb{F}^n:Hc^T=0\}
\mathcal{C}=\{c\in \mathbb{F}^n:Hc^\top=0\}
$$
- The right kernel of $H$ is $\mathcal{C}$.
- Multiplying $c^T$ by $H$ "checks" if $c\in \mathcal{C}$.
- Multiplying $c^\top$ by $H$ "checks" if $c\in \mathcal{C}$.
### Encoding of linear codes
@@ -144,7 +144,7 @@ Decoding: $(y+e)\to x$, $y=xG$.
Use **syndrome** to identify which coset $\mathcal{C}_i$ that the noisy-code to $\mathcal{C}_i+e$ belongs to.
$$
H(y+e)^T=H(y+e)=Hx+He=He
H(y+e)^\top=H(y+e)=Hx+He=He
$$
### Syndrome decoding
@@ -215,7 +215,7 @@ Fourth row is $\mathcal{C}+(00100)$.
Any two elements in a row are of the form $y_1'=y_1+e$ and $y_2'=y_2+e$ for some $e\in \mathbb{F}^n$.
Same syndrome if $H(y_1'+e)^T=H(y_2'+e)^T$.
Same syndrome if $H(y_1'+e)^\top=H(y_2'+e)^\top$.
Entries in different rows have different syndrome.

View File

@@ -7,7 +7,7 @@ Let $\mathcal{C}= [n,k,d]_{\mathbb{F}}$ be a linear code.
There are two equivalent ways to describe a linear code:
1. A generator matrix $G\in \mathbb{F}^{k\times n}_q$ with $k$ rows and $n$ columns, entry taken from $\mathbb{F}_q$. $\mathcal{C}=\{xG|x\in \mathbb{F}^k\}$
2. A parity check matrix $H\in \mathbb{F}^{(n-k)\times n}_q$ with $(n-k)$ rows and $n$ columns, entry taken from $\mathbb{F}_q$. $\mathcal{C}=\{c\in \mathbb{F}^n:Hc^T=0\}$
2. A parity check matrix $H\in \mathbb{F}^{(n-k)\times n}_q$ with $(n-k)$ rows and $n$ columns, entry taken from $\mathbb{F}_q$. $\mathcal{C}=\{c\in \mathbb{F}^n:Hc^\top=0\}$
### Dual code
@@ -21,7 +21,7 @@ $$
Also, the alternative definition is:
1. $C^{\perp}=\{x\in \mathbb{F}^n:Gx^T=0\}$ (only need to check basis of $C$)
1. $C^{\perp}=\{x\in \mathbb{F}^n:Gx^\top=0\}$ (only need to check basis of $C$)
2. $C^{\perp}=\{xH|x\in \mathbb{F}^{n-k}\}$
By rank-nullity theorem, $dim(C^{\perp})=n-dim(C)=n-k$.
@@ -87,7 +87,7 @@ Assume minimum distance is $d$. Show that every $d-1$ columns of $H$ are indepen
- Fact: In linear codes minimum distance is the minimum weight ($d_H(x,y)=w_H(x-y)$).
Indeed, if there exists a $d-1$ columns of $H$ that are linearly dependent, then we have $Hc^T=0$ for some $c\in \mathcal{C}$ with $w_H(c)<d$.
Indeed, if there exists a $d-1$ columns of $H$ that are linearly dependent, then we have $Hc^\top=0$ for some $c\in \mathcal{C}$ with $w_H(c)<d$.
Reverse are similar.
@@ -130,7 +130,7 @@ $k=2^m-m-1$.
Define the code by encoding function:
$E(x): \mathbb{F}_2^m\to \mathbb{F}_2^{2^m}=(xy_1^T,\cdots,xy_{2^m}^T)$ ($y\in \mathbb{F}_2^m$)
$E(x): \mathbb{F}_2^m\to \mathbb{F}_2^{2^m}=(xy_1^\top,\cdots,xy_{2^m}^\top)$ ($y\in \mathbb{F}_2^m$)
Space of codewords is image of $E$.

View File

@@ -258,7 +258,7 @@ Algorithm:
- Begin with $(n-k)\times (n-k)$ identity matrix.
- Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$)
- Then next column $h_{\ell}$ must not be in the space of any previous $d-2$ columns.
- $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^T$ for $x$ of Hamming weight at most $d-2$.
- $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^\top$ for $x$ of Hamming weight at most $d-2$.
- So the ineligible candidates for $h_{\ell}$ is:
- $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$.
- $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$, denoted by $V_q(\ell-1, d-2)$.

View File

@@ -148,15 +148,15 @@ The generator matrix for Reed-Solomon code is a Vandermonde matrix $V(a_1,a_2,\l
Fact: $V(a_1,a_2,\ldots,a_n)$ is invertible if and only if $a_1,a_2,\ldots,a_n$ are distinct. (that's how we choose $a_1,a_2,\ldots,a_n$)
The parity check matrix for Reed-Solomon code is also a Vandermonde matrix $V(a_1,a_2,\ldots,a_n)^T$ with scalar multiples of the columns.
The parity check matrix for Reed-Solomon code is also a Vandermonde matrix $V(a_1,a_2,\ldots,a_n)^\top$ with scalar multiples of the columns.
Some technical lemmas:
Let $G$ and $H$ be the generator and parity-check matrices of (any) linear code
$C = [n, k, d]_{\mathbb{F}_q}$. Then:
I. Then $H G^T = 0$.
II. Any matrix $M \in \mathbb{F}_q^{n-k \times k}$ such that $\rank(M) = n - k$ and $M G^T = 0$ is a parity-check matrix for $C$ (i.e. $C = \ker M$).
I. Then $H G^\top = 0$.
II. Any matrix $M \in \mathbb{F}_q^{n-k \times k}$ such that $\rank(M) = n - k$ and $M G^\top = 0$ is a parity-check matrix for $C$ (i.e. $C = \ker M$).
## Reed-Muller code

View File

@@ -22,4 +22,5 @@ export default {
CSE5313_L16: "CSE5313 Coding and information theory for data science (Exam Review)",
CSE5313_L17: "CSE5313 Coding and information theory for data science (Lecture 17)",
CSE5313_L18: "CSE5313 Coding and information theory for data science (Lecture 18)",
CSE5313_L19: "CSE5313 Coding and information theory for data science (Exam Review)",
}

View File

@@ -469,7 +469,7 @@ $$
Then we use $\mathcal{L}_{ds}$ to enforce the smoothness of the disparity map.
$$
\mathcal{L}_{ds}=\sum_{p\in I^l}\left|\partial_x d^l_p\right|e^{-\left|\partial_x d^l_p\right|}+\left|\partial_y d^l_p\right|e^{-\left|\partial_y d^l_p\right|}=\sum_{p_t}|\nabla D(p_t)|\cdot \left(e^{-|\nabla I(p_t)|}\right)^T\tag{2}
\mathcal{L}_{ds}=\sum_{p\in I^l}\left|\partial_x d^l_p\right|e^{-\left|\partial_x d^l_p\right|}+\left|\partial_y d^l_p\right|e^{-\left|\partial_y d^l_p\right|}=\sum_{p_t}|\nabla D(p_t)|\cdot \left(e^{-|\nabla I(p_t)|}\right)^\top\tag{2}
$$
Replacing $\hat{I}^{rig}_s$ with $\hat{I}^{full}_s$, in (1) and (2), we get the $\mathcal{L}_{fw}$ and $\mathcal{L}_{fs}$ for the non-rigid motion localizer.

View File

@@ -64,7 +64,7 @@ $d = \begin{bmatrix}
u \\ v
\end{bmatrix}$
The solution is $d=(A^T A)^{-1} A^T b$
The solution is $d=(A^\top A)^{-1} A^\top b$
Lucas-Kanade flow:
@@ -170,7 +170,7 @@ E = \sum_{i=1}^n (a(x_i-\bar{x})+b(y_i-\bar{y}))^2 = \left\|\begin{bmatrix}x_1-\
$$
We want to find $N$ that minimizes $\|UN\|^2$ subject to $\|N\|^2= 1$
Solution is given by the eigenvector of $U^T U$ associated with the smallest eigenvalue
Solution is given by the eigenvector of $U^\top U$ associated with the smallest eigenvalue
Drawbacks:

View File

@@ -178,7 +178,7 @@ $$
\begin{pmatrix}a\\b\\c\end{pmatrix} \times \begin{pmatrix}a'\\b'\\c'\end{pmatrix} = \begin{pmatrix}bc'-b'c\\ca'-c'a\\ab'-a'b\end{pmatrix}
$$
Let $h_1^T, h_2^T, h_3^T$ be the rows of $H$. Then
Let $h_1^\top, h_2^\top, h_3^\top$ be the rows of $H$. Then
$$
x_i' × Hx_i=\begin{pmatrix}
@@ -186,15 +186,15 @@ x_i' × Hx_i=\begin{pmatrix}
y_i' \\
1
\end{pmatrix} \times \begin{pmatrix}
h_1^T x_i \\
h_2^T x_i \\
h_3^T x_i
h_1^\top x_i \\
h_2^\top x_i \\
h_3^\top x_i
\end{pmatrix}
=
\begin{pmatrix}
y_i' h_3^T x_ih_2^T x_i \\
h_1^T x_ix_i' h_3^T x_i \\
x_i' h_2^T x_iy_i' h_1^T x_i
y_i' h_3^\top x_ih_2^\top x_i \\
h_1^\top x_ix_i' h_3^\top x_i \\
x_i' h_2^\top x_iy_i' h_1^\top x_i
\end{pmatrix}
$$
@@ -206,15 +206,15 @@ x_i' × Hx_i=\begin{pmatrix}
y_i' \\
1
\end{pmatrix} \times \begin{pmatrix}
h_1^T x_i \\
h_2^T x_i \\
h_3^T x_i
h_1^\top x_i \\
h_2^\top x_i \\
h_3^\top x_i
\end{pmatrix}
=
\begin{pmatrix}
y_i' h_3^T x_ih_2^T x_i \\
h_1^T x_ix_i' h_3^T x_i \\
x_i' h_2^T x_iy_i' h_1^T x_i
y_i' h_3^\top x_ih_2^\top x_i \\
h_1^\top x_ix_i' h_3^\top x_i \\
x_i' h_2^\top x_iy_i' h_1^\top x_i
\end{pmatrix}
$$
@@ -222,9 +222,9 @@ Rearranging the terms:
$$
\begin{bmatrix}
0^T &-x_i^T &y_i' x_i^T \\
x_i^T &0^T &-x_i' x_i^T \\
y_i' x_i^T &x_i' x_i^T &0^T
0^\top &-x_i^\top &y_i' x_i^\top \\
x_i^\top &0^\top &-x_i' x_i^\top \\
y_i' x_i^\top &x_i' x_i^\top &0^\top
\end{bmatrix}
\begin{bmatrix}
h_1 \\

View File

@@ -17,16 +17,16 @@ If we set the config for the first camera as the world origin and $[I|0]\begin{p
Notice that $x'\cdot [t\times (Ry)]=0$
$$
x'^T E x_1 = 0
x'^\top E x_1 = 0
$$
We denote the constraint defined by the Essential Matrix as $E$.
$E x$ is the epipolar line associated with $x$ ($l'=Ex$)
$E^T x'$ is the epipolar line associated with $x'$ ($l=E^T x'$)
$E^\top x'$ is the epipolar line associated with $x'$ ($l=E^\top x'$)
$E e=0$ and $E^T e'=0$ ($x$ and $x'$ don't matter)
$E e=0$ and $E^\top e'=0$ ($x$ and $x'$ don't matter)
$E$ is singular (rank 2) and have five degrees of freedom.
@@ -35,13 +35,13 @@ $E$ is singular (rank 2) and have five degrees of freedom.
If the calibration matrices $K$ and $K'$ are unknown, we can write the epipolar constraint in terms of unknown normalized coordinates:
$$
x'^T_{norm} E x_{norm} = 0
x'^\top_{norm} E x_{norm} = 0
$$
where $x_{norm}=K^{-1} x$, $x'_{norm}=K'^{-1} x'$
$$
x'^T_{norm} E x_{norm} = 0\implies x'^T_{norm} Fx=0
x'^\top_{norm} E x_{norm} = 0\implies x'^\top_{norm} Fx=0
$$
where $F=K'^{-1}EK^{-1}$ is the **Fundamental Matrix**.
@@ -60,17 +60,17 @@ Properties of $F$:
$F x$ is the epipolar line associated with $x$ ($l'=F x$)
$F^T x'$ is the epipolar line associated with $x'$ ($l=F^T x'$)
$F^\top x'$ is the epipolar line associated with $x'$ ($l=F^\top x'$)
$F e=0$ and $F^T e'=0$
$F e=0$ and $F^\top e'=0$
$F$ is singular (rank two) and has seven degrees of freedom
#### Estimating the fundamental matrix
Given: correspondences $x=(x,y,1)^T$ and $x'=(x',y',1)^T$
Given: correspondences $x=(x,y,1)^\top$ and $x'=(x',y',1)^\top$
Constraint: $x'^T F x=0$
Constraint: $x'^\top F x=0$
$$
(x',y',1)\begin{bmatrix}
@@ -95,7 +95,7 @@ F=U\begin{bmatrix}
\sigma_1 & 0 \\
0 & \sigma_2 \\
0 & 0
\end{bmatrix}V^T
\end{bmatrix}V^\top
$$
## Structure from Motion
@@ -126,7 +126,7 @@ a_{21} & a_{22} & a_{23} & t_2 \\
0 & 0 & 0 & 1
\end{bmatrix}=\begin{bmatrix}
A & t \\
0^T & 1
0^\top & 1
\end{bmatrix}
$$
@@ -160,10 +160,10 @@ The reconstruction is defined up to an arbitrary affine transformation $Q$ (12 d
$$
\begin{bmatrix}
A & t \\
0^T & 1
0^\top & 1
\end{bmatrix}\rightarrow\begin{bmatrix}
A & t \\
0^T & 1
0^\top & 1
\end{bmatrix}Q^{-1}, \quad \begin{pmatrix}X_j\\1\end{pmatrix}\rightarrow Q\begin{pmatrix}X_j\\1\end{pmatrix}
$$

View File

@@ -74,7 +74,7 @@ x\\y
\end{pmatrix}
$$
To undo the rotation, we need to rotate the image by $-\theta$. This is equivalent to apply $R^T$ to the image.
To undo the rotation, we need to rotate the image by $-\theta$. This is equivalent to apply $R^\top$ to the image.
#### Affine transformation

View File

@@ -96,7 +96,7 @@ Example: Linear classification models
Find a linear function that separates the data.
$$
f(x) = w^T x + b
f(x) = w^\top x + b
$$
[Linear classification models](http://cs231n.github.io/linear-classify/)
@@ -144,13 +144,13 @@ This is a convex function, so we can find the global minimum.
The gradient is:
$$
\nabla_w||Xw-Y||^2 = 2X^T(Xw-Y)
\nabla_w||Xw-Y||^2 = 2X^\top(Xw-Y)
$$
Set the gradient to 0, we get:
$$
w = (X^T X)^{-1} X^T Y
w = (X^\top X)^{-1} X^\top Y
$$
From the maximum likelihood perspective, we can also derive the same result.

View File

@@ -59,7 +59,7 @@ Suppose $k=1$, $e=l(f_1(x,w_1),y)$
Example: $e=(f_1(x,w_1)-y)^2$
So $h_1=f_1(x,w_1)=w^T_1x$, $e=l(h_1,y)=(y-h_1)^2$
So $h_1=f_1(x,w_1)=w^\top_1x$, $e=l(h_1,y)=(y-h_1)^2$
$$
\frac{\partial e}{\partial w_1}=\frac{\partial e}{\partial h_1}\frac{\partial h_1}{\partial w_1}

View File

@@ -20,7 +20,7 @@ Suppose $k=1$, $e=l(f_1(x,w_1),y)$
Example: $e=(f_1(x,w_1)-y)^2$
So $h_1=f_1(x,w_1)=w^T_1x$, $e=l(h_1,y)=(y-h_1)^2$
So $h_1=f_1(x,w_1)=w^\top_1x$, $e=l(h_1,y)=(y-h_1)^2$
$$
\frac{\partial e}{\partial w_1}=\frac{\partial e}{\partial h_1}\frac{\partial h_1}{\partial w_1}

View File

@@ -262,10 +262,10 @@ Basic definitions
The special orthogonal group $SO(n)$ is the set of all **distance preserving** linear transformations on $\mathbb{R}^n$.
It is the group of all $n\times n$ orthogonal matrices ($A^T A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
It is the group of all $n\times n$ orthogonal matrices ($A^\top A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
$$
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^T A=I_n, \det(A)=1\}
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^\top A=I_n, \det(A)=1\}
$$
<details>
@@ -276,7 +276,7 @@ In [The random Matrix Theory of the Classical Compact groups](https://case.edu/a
$O(n)$ (the group of all $n\times n$ **orthogonal matrices** over $\mathbb{R}$),
$$
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^T=A^T A=I_n\}
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^\top=A^\top A=I_n\}
$$
$U(n)$ (the group of all $n\times n$ **unitary matrices** over $\mathbb{C}$),
@@ -296,7 +296,7 @@ $$
$Sp(2n)$ (the group of all $2n\times 2n$ symplectic matrices over $\mathbb{C}$),
$$
Sp(2n)=\{U\in U(2n): U^T J U=UJU^T=J\}
Sp(2n)=\{U\in U(2n): U^\top J U=UJU^\top=J\}
$$
where $J=\begin{pmatrix}

View File

@@ -8,10 +8,10 @@ The page's lemma is a fundamental result in quantum information theory that prov
The special orthogonal group $SO(n)$ is the set of all **distance preserving** linear transformations on $\mathbb{R}^n$.
It is the group of all $n\times n$ orthogonal matrices ($A^T A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
It is the group of all $n\times n$ orthogonal matrices ($A^\top A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
$$
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^T A=I_n, \det(A)=1\}
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^\top A=I_n, \det(A)=1\}
$$
<details>
@@ -22,7 +22,7 @@ In [The random Matrix Theory of the Classical Compact groups](https://case.edu/a
$O(n)$ (the group of all $n\times n$ **orthogonal matrices** over $\mathbb{R}$),
$$
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^T=A^T A=I_n\}
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^\top=A^\top A=I_n\}
$$
$U(n)$ (the group of all $n\times n$ **unitary matrices** over $\mathbb{C}$),
@@ -42,7 +42,7 @@ $$
$Sp(2n)$ (the group of all $2n\times 2n$ symplectic matrices over $\mathbb{C}$),
$$
Sp(2n)=\{U\in U(2n): U^T J U=UJU^T=J\}
Sp(2n)=\{U\in U(2n): U^\top J U=UJU^\top=J\}
$$
where $J=\begin{pmatrix}

View File

@@ -74,7 +74,7 @@ $c\in \mathbb{C}$.
The matrix transpose is defined by
$$
u^T=(a_1,a_2,\cdots,a_n)^T=\begin{pmatrix}
u^\top=(a_1,a_2,\cdots,a_n)^\top=\begin{pmatrix}
a_1 \\
a_2 \\
\vdots \\
@@ -694,7 +694,7 @@ $$
The unitary group $U(n)$ is the group of all $n\times n$ unitary matrices.
Such that $A^*=A$, where $A^*$ is the complex conjugate transpose of $A$. $A^*=(\overline{A})^T$.
Such that $A^*=A$, where $A^*$ is the complex conjugate transpose of $A$. $A^*=(\overline{A})^\top$.
#### Cyclic group $\mathbb{Z}_n$

View File

@@ -25,7 +25,7 @@ Let $A$ be an $m \times n$ matrix, then
* The column rank of $A$ is the dimension of the span of the columns in $\mathbb{F}^{m,1}$.
* The row range of $A$ is the dimension of the span of the row in $\mathbb{F}^{1,n}$.
> Transpose: $A^t=A^T$ refers to swapping rows and columns
> Transpose: $A^t=A^\top$ refers to swapping rows and columns
#### Theorem 3.56 (Column-Row Factorization)
@@ -64,7 +64,7 @@ Proof:
Note that by **Theorem 3.56**, if $A$ is $m\times n$ and has column rank $c$. $A=CR$ for some $C$ is a $m\times c$ matrix, $R$ is a $c\times n$ matrices, ut the rows of $CR$ are a linear combination of the rows of $R$, and row rank of $R\leq C$. So row rank $A\leq$ column rank of $A$.
Taking a transpose of matrix, then row rank of $A^T$ (column rank of $A$) $\leq$ column rank of $A^T$ (row rank $A$).
Taking a transpose of matrix, then row rank of $A^\top$ (column rank of $A$) $\leq$ column rank of $A^\top$ (row rank $A$).
So column rank is equal to row rank.

View File

@@ -39,13 +39,13 @@ $T$ is surjective $\iff range\ T=W\iff null\ T'=0\iff T'$ injective
Let $V,W$ be a finite dimensional vector space, $T\in \mathscr{L}(V,W)$
Then $M(T')=(M(T))^T$. Where the basis for $M(T)'$ are the dual basis to the ones for $M(T)$
Then $M(T')=(M(T))^\top$. Where the basis for $M(T)'$ are the dual basis to the ones for $M(T)$
#### Theorem 3.133
$col\ rank\ A=row\ rank\ A$
Proof: $col\ rank\ A=col\ rank\ (M(T))=dim\ range\ T=dim\ range\ T'=dim\ range\ T'=col\ rank\ (M(T'))=col\ rank\ (M(T)^T)=row\ rank\ (M(T))$
Proof: $col\ rank\ A=col\ rank\ (M(T))=dim\ range\ T=dim\ range\ T'=dim\ range\ T'=col\ rank\ (M(T'))=col\ rank\ (M(T)^\top)=row\ rank\ (M(T))$
## Chapter V Eigenvalue and Eigenvectors