seems working on this small batch

This commit is contained in:
Zheyuan Wu
2025-10-25 00:01:23 -05:00
parent f8df23526b
commit f89b3cb70d
434 changed files with 377 additions and 70501 deletions

View File

@@ -1,437 +0,0 @@
# Math 401, Fall 2025: Thesis notes, R1, Non-commutative probability theory
> Progress: 0/NaN=NaN% (denominator and enumerator may change)
## Notations and definitions
This part will cover the necessary notations and definitions for the remaining parts of the recollection.
### Notations of Linear algebra
#### Definition of vector space
[link to vector space](../../Math429/Math429_L1#definition-1.20)
A vector space over $\mathbb{f}$ is a set $V$ along with two operators $v+w\in V$ for $v,w\in V$, and $\lambda \cdot v$ for $\lambda\in \mathbb{F}$ and $v\in V$ satisfying the following properties:
* Commutativity: $\forall v, w\in V,v+w=w+v$
* Associativity: $\forall u,v,w\in V,(u+v)+w=u+(v+w)$
* Existence of additive identity: $\exists 0\in V$ such that $\forall v\in V, 0+v=v$
* Existence of additive inverse: $\forall v\in V, \exists w \in V$ such that $v+w=0$
* Existence of multiplicative identity: $\exists 1 \in \mathbb{F}$ such that $\forall v\in V,1\cdot v=v$
* Distributive properties: $\forall v, w\in V$ and $\forall a,b\in \mathbb{F}$, $a\cdot(v+w)=a\cdot v+ a\cdot w$ and $(a+b)\cdot v=a\cdot v+b\cdot v$
#### Definition of inner product
[link to inner product](../../Math429/Math429_L25#definition-6.2)
An inner product is a bilinear function $\langle,\rangle:V\times V\to \mathbb{F}$ satisfying the following properties:
* Positivity: $\langle v,v\rangle\geq 0$
* Definiteness: $\langle v,v\rangle=0\iff v=0$
* Additivity: $\langle u+v,w\rangle=\langle u,w\rangle+\langle v,w\rangle$
* Homogeneity: $\langle \lambda u, v\rangle=\lambda\langle u,v\rangle$
* Conjugate symmetry: $\langle u,v\rangle=\overline{\langle v,u\rangle}$
<details>
<summary>Examples of inner product</summary>
Let $V=\mathbb{R}^n$.
The dot product is defined by
$$
\langle u,v\rangle=u_1v_1+u_2v_2+\cdots+u_nv_n
$$
is an inner product.
---
Let $V=L^2(\mathbb{R}, \lambda)$, where $\lambda$ is the Lebesgue measure. $f,g:\mathbb{R}\to \mathbb{C}$ are complex-valued square integrable functions.
The Hermitian inner product is defined by
$$
\langle f,g\rangle=\int_\mathbb{R} \overline{f(x)}g(x) d\lambda(x)
$$
is an inner product.
---
Let $A,B$ be two linear transformation on $\mathbb{R}^n$.
The Hilbert-Schmidt inner product is defined by
$$
\langle A,B\rangle=\operatorname{Tr}(A^*B)=\sum_{i=1}^n \sum_{j=1}^n \overline{a_{ij}}b_{ij}
$$
is an inner product.
</details>
#### Definition of inner product space
A inner product space is a vector space equipped with an inner product.
#### Definition of completeness
[link to completeness](../../Math4111/Math4111_L17#definition-312)
Note that every inner product space is a metric space.
Let $X$ be a metric space. We say $X$ is **complete** if every Cauchy sequence (that is, a sequence such that $\forall \epsilon>0, \exists N$ such that $\forall m,n\geq N, d(p_m,p_n)<\epsilon$) in $X$ converges.
#### Definition of Hilbert space
A Hilbert space is a complete inner product space.
#### Motivation of Tensor product
Recall from the traditional notation of product space of two vector spaces $V$ and $W$, that is, $V\times W$, is the set of all ordered pairs $(v,w)$ where $v\in V$ and $w\in W$.
The space has dimension $\dim V+\dim W$.
We want to define a vector space with notation of multiplication of two vectors from different vector spaces.
That is
$$
(v_1+v_1)\otimes w=(v_1\otimes w)+(v_2\otimes w)\text{ and } v\otimes (w_1+w_2)=(v\otimes w_1)+(v\otimes w_2)
$$
and enables scalar multiplication by
$$
\lambda (v\otimes w)=(\lambda v)\otimes w=v\otimes (\lambda w)
$$
And we wish to build a way associates the basis of $V$ and $W$ to the basis of $V\otimes W$. That makes the tensor product a vector space with dimension $\dim V\times \dim W$.
#### Definition of linear functional
> [!TIP]
>
> Note the difference between a linear functional and a linear map.
>
> A generalized linear map is a function $f:V\to W$ satisfying the condition
>
> 1. $f(u+v)=f(u)+f(v)$
> 2. $f(\lambda v)=\lambda f(v)$
A linear functional is a linear map from $V$ to $\mathbb{F}$.
#### Definition of bilinear functional
A bilinear functional is a bilinear function $\beta:V\times W\to \mathbb{F}$ satisfying the condition that $v\to \beta(v,w)$ is a linear functional for all $w\in W$ and $w\to \beta(v,w)$ is a linear functional for all $v\in V$.
The vector space of all bilinear functionals is denoted by $\mathcal{B}(V,W)$.
#### Definition of tensor product
Let $V,W$ be two vector spaces.
Let $V'$ and $W'$ be the dual spaces of $V$ and $W$, respectively, that is $V'=\{\psi:V\to \mathbb{F}\}$ and $W'=\{\phi:W\to \mathbb{F}\}$, $\psi, \phi$ are linear functionals.
The tensor product of vectors $v\in V$ and $w\in W$ is the bilinear functional defined by $\forall (\psi,\phi)\in V'\times W'$ given by the notation
$$
(v\otimes w)(\psi,\phi)\coloneqq\psi(v)\phi(w)
$$
The tensor product of two vector spaces $V$ and $W$ is the vector space $\mathcal{B}(V',W')$
Notice that the basis of such vector space is the linear combination of the basis of $V'$ and $W'$, that is, if $\{e_i\}$ is the basis of $V'$ and $\{f_j\}$ is the basis of $W'$, then $\{e_i\otimes f_j\}$ is the basis of $\mathcal{B}(V',W')$.
That is, every element of $\mathcal{B}(V',W')$ can be written as a linear combination of the basis.
Since $\{e_i\}$ and $\{f_j\}$ are bases of $V'$ and $W'$, respectively, then we can always find a set of linear functionals $\{\phi_i\}$ and $\{\psi_j\}$ such that $\phi_i(e_j)=\delta_{ij}$ and $\psi_j(f_i)=\delta_{ij}$.
Here $\delta_{ij}=\begin{cases}
1 & \text{if } i=j \\
0 & \text{otherwise}
\end{cases}$ is the Kronecker delta.
$$
V\otimes W=\left\{\sum_{i=1}^n \sum_{j=1}^m a_{ij} \phi_i(v)\psi_j(w): \phi_i\in V', \psi_j\in W'\right\}
$$
Note that $\sum_{i=1}^n \sum_{j=1}^m a_{ij} \phi_i(v)\psi_j(w)$ is a bilinear functional that maps $V'\times W'$ to $\mathbb{F}$.
This enables basis free construction of vector spaces with proper multiplication and scalar multiplication.
This vector space is equipped with the unique inner product $\langle v\otimes w, u\otimes x\rangle_{V\otimes W}$ defined by
$$
\langle v\otimes w, u\otimes x\rangle=\langle v,u\rangle_V\langle w,x\rangle_W
$$
In practice, we ignore the subscript of the vector space and just write $\langle v\otimes w, u\otimes x\rangle=\langle v,u\rangle\langle w,x\rangle$.
> [!NOTE]
>
> All those definitions and proofs can be found in Linear Algebra Done Right by Sheldon Axler.
### Notations in measure theory
#### Definition of Sigma algebra
[link to measure theory](../../Math4121/Math4121_L25#definition-of-sigma-algebra)
A collection of sets $\mathcal{A}$ is called a sigma-algebra if it satisfies the following properties:
1. $\emptyset \in \mathcal{A}$
2. If $\{A_j\}_{j=1}^\infty \subset \mathcal{A}$, then $\bigcup_{j=1}^\infty A_j \in \mathcal{A}$
3. If $A \in \mathcal{A}$, then $A^c \in \mathcal{A}$
#### Definition of Measure
A measure is a function $v:\mathcal{A}\to \mathbb{R}$ satisfying the following properties:
1. $v(\emptyset)=0$
2. If $\{A_j\}_{j=1}^\infty \subset \mathcal{A}$ are pairwise disjoint, then $v(\bigcup_{j=1}^\infty A_j)=\sum_{j=1}^\infty v(A_j)$ (countable additivity)
3. If $A\in \mathcal{A}$, then $v(A)\geq 0$ (non-negativity)
<details>
<summary>Examples of measure</summary>
The [Borel measure on $\mathbb{R}$](../../Math4121/Math4121_L25#definition-of-borel-measure) is the collection of all closed, open, and half-open intervals with $m(U)=\ell(U)$ for any open set $U$.
The [Lebesgue measure on $\mathbb{R}$](../../Math4121/Math4121_L27#definition-of-lebesgue-measure) is the collection of all Lebesgue measurable sets with $m_i=\sup_{K\text{ closed},K\subseteq S}m(K)$ and $m_e=\inf_{U\text{ open},S\subseteq U}m(U)$. and $m(S)=m_e(S)=m_i(S)$ for any Lebesgue measurable set $S$.
</details>
#### Definition of Probability measure
Let $\mathscr{F}$ be a sigma-algebra on a set $\Omega$. A probability measure is a function $P:\mathscr{F}\to [0,1]$ satisfying the following properties:
1. $P(\Omega)=1$
2. $P$ is a measure on $\mathscr{F}$
#### Definition of Measurable space
A measurable space is a pair $(X, \mathscr{B}, v)$, where $X$ is a set and $\mathscr{B}$ is a sigma-algebra on $X$.
In some literatures, $\mathscr{B}$ is ignored and we only denote it as $(X, v)$.
<details>
<summary>Examples of measurable space</summary>
Let $\Omega$ be arbitrary set.
Let $\mathscr{B}(\mathbb{C})$ be the Borel sigma-algebra on $\mathbb{C}$ generated from rectangles over complex plane with real number axes and $\lambda$ be the Lebesgue measure associated with it.
Let $\mathscr{F}$ be the set of square integrable, that is,
$$
\int_\Omega |f(x)|^2 d\lambda(x)<\infty
$$
complex-valued functions on $\Omega$, that is, $f:\Omega\to \mathbb{C}$.
Then the measurable space $(\Omega, \mathscr{B}(\mathbb{C}), \lambda)$ is a measurable space. We usually denote this as $L^2(\Omega, \mathscr{B}(\mathbb{C}), \lambda)$.
If $\Omega=\mathbb{R}$, then we denote such measurable space as $L^2(\mathbb{R}, \lambda)$.
<details>
#### Probability space
A probability space is a triple $(\Omega, \mathscr{F}, P)$, where $\Omega$ is a set, $\mathscr{F}$ is a sigma-algebra on $\Omega$, and $P$ is a probability measure on $\mathscr{F}$.
### Lipschitz function
#### $\eta$-Lipschitz function
Let $(X,\operatorname{dist}_X)$ and $(Y,\operatorname{dist}_Y)$ be two metric spaces. A function $f:X\to Y$ is said to be $\eta$-Lipschitz if there exists a constant $L\in \mathbb{R}$ such that
$$
\operatorname{dist}_Y(f(x),f(y))\leq L\operatorname{dist}_X(x,y)
$$
for all $x,y\in X$. And $\eta=\|f\|_{\operatorname{Lip}}=\inf_{L\in \mathbb{R}}L$.
That basically means that the function $f$ should not change the distance between any two pairs of points in $X$ by more than a factor of $L$.
### Operations on Hilbert space and Measurements
Basic definitions
#### $SO(n)$
The special orthogonal group $SO(n)$ is the set of all **distance preserving** linear transformations on $\mathbb{R}^n$.
It is the group of all $n\times n$ orthogonal matrices ($A^T A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
$$
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^T A=I_n, \det(A)=1\}
$$
<details>
<summary>Extensions</summary>
In [The random Matrix Theory of the Classical Compact groups](https://case.edu/artsci/math/esmeckes/Haar_book.pdf), the author gives a more general definition of the Haar measure on the compact group $SO(n)$,
$O(n)$ (the group of all $n\times n$ **orthogonal matrices** over $\mathbb{R}$),
$$
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^T=A^T A=I_n\}
$$
$U(n)$ (the group of all $n\times n$ **unitary matrices** over $\mathbb{C}$),
$$
U(n)=\{A\in \mathbb{C}^{n\times n}: A^*A=AA^*=I_n\}
$$
Recall that $A^*$ is the complex conjugate transpose of $A$.
$SU(n)$ (the group of all $n\times n$ unitary matrices over $\mathbb{C}$ with determinant $1$),
$$
SU(n)=\{A\in \mathbb{C}^{n\times n}: A^*A=AA^*=I_n, \det(A)=1\}
$$
$Sp(2n)$ (the group of all $2n\times 2n$ symplectic matrices over $\mathbb{C}$),
$$
Sp(2n)=\{U\in U(2n): U^T J U=UJU^T=J\}
$$
where $J=\begin{pmatrix}
0 & I_n \\
-I_n & 0
\end{pmatrix}$ is the standard symplectic matrix.
</details>
### Haar measure
Let $(SO(n), \| \cdot \|, \mu)$ be a metric measure space where $\| \cdot \|$ is the [Hilbert-Schmidt norm](https://notenextra.trance-0.com/Math401/Math401_T2#definition-of-hilbert-schmidt-norm) and $\mu$ is the measure function.
The Haar measure on $SO(n)$ is the unique probability measure that is invariant under the action of $SO(n)$ on itself.
That is also called _translation-invariant_.
That is, fixing $B\in SO(n)$, $\forall A\in SO(n)$, $\mu(A\cdot B)=\mu(B\cdot A)=\mu(B)$.
The Haar measure is the unique probability measure that is invariant under the action of $SO(n)$ on itself.
_The existence and uniqueness of the Haar measure is a theorem in compact lie group theory. For this research topic, we will not prove it._
### Random sampling on the $\mathbb{C}P^n$
Note that the space of pure state in bipartite system
## Non-commutative probability theory
### Pure state and mixed state
A pure state is a state that is represented by a unit vector in $\mathscr{H}^{\otimes N}$.
> As analogy, a pure state is the basis element of the vector space, a mixed state is a linear combination of basis elements.
A mixed state is a state that is represented by a density operator (linear combination of pure states) in $\mathscr{H}^{\otimes N}$.
### Partial trace and purification
#### Partial trace
Recall that the bipartite state of a quantum system is a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
##### Definition of partial trace for arbitrary linear operators
Let $T$ be a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
An operator $T$ on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$ can be written as (by the definition of [tensor product of linear operators](https://notenextra.trance-0.com/Math401/Math401_T2#tensor-products-of-linear-operators))
$$
T=\sum_{i=1}^n a_i A_i\otimes B_i
$$
where $A_i$ is a linear operator on $\mathscr{A}$ and $B_i$ is a linear operator on $\mathscr{B}$.
The $\mathscr{B}$-partial trace of $T$ ($\operatorname{Tr}_{\mathscr{B}}(T):\mathcal{L}(\mathscr{A}\otimes \mathscr{B})\to \mathcal{L}(\mathscr{A})$) is the linear operator on $\mathscr{A}$ defined by
$$
\operatorname{Tr}_{\mathscr{B}}(T)=\sum_{i=1}^n a_i \operatorname{Tr}(B_i) A_i
$$
#### Definition of partial trace for density operators
Let $\rho$ be a density operator in $\mathscr{H}_1\otimes\mathscr{H}_2$, the partial trace of $\rho$ over $\mathscr{H}_2$ is the density operator in $\mathscr{H}_1$ (reduced density operator for the subsystem $\mathscr{H}_1$) given by:
$$
\rho_1\coloneqq\operatorname{Tr}_2(\rho)
$$
<details>
<summary>Examples</summary>
Let $\rho=\frac{1}{\sqrt{2}}(|01\rangle+|10\rangle)$ be a density operator on $\mathscr{H}=\mathbb{C}^2\otimes \mathbb{C}^2$.
Expand the expression of $\rho$ in the basis of $\mathbb{C}^2\otimes\mathbb{C}^2$ using linear combination of basis vectors:
$$
\rho=\frac{1}{2}(|01\rangle\langle 01|+|01\rangle\langle 10|+|10\rangle\langle 01|+|10\rangle\langle 10|)
$$
Note $\operatorname{Tr}_2(|ab\rangle\langle cd|)=|a\rangle\langle c|\cdot \langle b|d\rangle$.
Then the reduced density operator of the subsystem $\mathbb{C}^2$ in first qubit is, note the $\langle 0|0\rangle=\langle 1|1\rangle=1$ and $\langle 0|1\rangle=\langle 1|0\rangle=0$:
$$
\begin{aligned}
\rho_1&=\operatorname{Tr}_2(\rho)\\
&=\frac{1}{2}(\langle 1|1\rangle |0\rangle\langle 0|+\langle 0|1\rangle |0\rangle\langle 1|+\langle 1|0\rangle |1\rangle\langle 0|+\langle 0|0\rangle |1\rangle\langle 1|)\\
&=\frac{1}{2}(|0\rangle\langle 0|+|1\rangle\langle 1|)\\
&=\frac{1}{2}I
\end{aligned}
$$
is a mixed state.
</details>
### Purification
Let $\rho$ be any [state](https://notenextra.trance-0.com/Math401/Math401_T6#pure-states) (may not be pure) on the finite dimensional Hilbert space $\mathscr{H}$. then there exists a unit vector $w\in \mathscr{H}\otimes \mathscr{H}$ such that $\rho=\operatorname{Tr}_2(|w\rangle\langle w|)$ is a pure state.
<details>
<summary>Proof</summary>
Let $(u_1,u_2,\cdots,u_n)$ be an orthonormal basis of $\mathscr{H}$ consisting of eigenvectors of $\rho$ for the eigenvalues $p_1,p_2,\cdots,p_n$. As $\rho$ is a states, $p_i\geq 0$ for all $i$ and $\sum_{i=1}^n p_i=1$.
We can write $\rho$ as
$$
\rho=\sum_{i=1}^n p_i |u_i\rangle\langle u_i|
$$
Let $w=\sum_{i=1}^n \sqrt{p_i} u_i\otimes u_i$, note that $w$ is a unit vector (pure state). Then
$$
\begin{aligned}
\operatorname{Tr}_2(|w\rangle\langle w|)&=\operatorname{Tr}_2(\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} |u_i\otimes u_i\rangle \langle u_j\otimes u_j|)\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \operatorname{Tr}_2(|u_i\otimes u_i\rangle \langle u_j\otimes u_j|)\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \langle u_i|u_j\rangle |u_i\rangle\langle u_i|\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \delta_{ij} |u_i\rangle\langle u_i|\\
&=\sum_{i=1}^n p_i |u_i\rangle\langle u_i|\\
&=\rho
\end{aligned}
$$
is a pure state.
</details>
## Drawing the connection between the space $S^{2n+1}$, $CP^n$, and $\mathbb{R}$
A pure quantum state of size $N$ can be identified with a **Hopf circle** on the sphere $S^{2N-1}$.
A random pure state $|\psi\rangle$ of a bipartite $N\times K$ system such that $K\geq N\geq 3$.
The partial trace of such system produces a mixed state $\rho(\psi)=\operatorname{Tr}_K(|\psi\rangle\langle \psi|)$, with induced measure $\mu_K$. When $K=N$, the induced measure $\mu_K$ is the Hilbert-Schmidt measure.
Consider the function $f:S^{2N-1}\to \mathbb{R}$ defined by $f(x)=S(\rho(\psi))$, where $S(\cdot)$ is the von Neumann entropy. The Lipschitz constant of $f$ is $\sim \ln N$.

View File

@@ -1,517 +0,0 @@
# Math 401, Fall 2025: Thesis notes, R2, Levy's concentration theorem and Levy's family
> Progress: 2/5=40% (denominator and enumerator may change)
## Levy's concentration theorem
> [!TIP]
>
> This version of Levy's concentration theorem can be found in [Geometry of Quantum states](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C) 15.84 and 15.85.
Our goal is to prove the generalized version of Levy's concentration theorem used in Hayden's work for $\eta$-Lipschitz functions.
Let $f:S^{n-1}\to \mathbb{R}$ be a $\eta$-Lipschitz function. Let $M_f$ denote the median of $f$ and $\langle f\rangle$ denote the mean of $f$. (Note this can be generalized to many other manifolds.)
Select a random point $x\in S^{n-1}$ with $n>2$ according to the uniform measure (Haar measure). Then the probability of observing a value of $f$ much different from the reference value is exponentially small.
$$
\operatorname{Pr}[|f(x)-M_f|>\epsilon]\leq \exp(-\frac{n\epsilon^2}{2\eta^2})
$$
$$
\operatorname{Pr}[|f(x)-\langle f\rangle|>\epsilon]\leq 2\exp(-\frac{(n-1)\epsilon^2}{2\eta^2})
$$
### Levy's concentration theorem via sub-Gaussian concentration
> [!TIP]
>
> This version of Levy's concentration theorem can be found in [High-dimensional probability](https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-2.pdf) 5.1.4.
#### Isoperimetric inequality on $\mathbb{R}^n$
Among all subsets $A\subset \mathbb{R}^n$ with a given volume, the Euclidean ball has the minimal area.
That is, for any $\epsilon>0$, Euclidean balls minimize the volume of the $\epsilon$-neighborhood of $A$.
Where the volume of the $\epsilon$-neighborhood of $A$ is defined as
$$
A_\epsilon(A)\coloneqq \{x\in \mathbb{R}^n: \exists y\in A, \|x-y\|_2\leq \epsilon\}=A+\epsilon B_2^n
$$
Here the $\|\cdot\|_2$ is the Euclidean norm. (The theorem holds for both geodesic metric on sphere and Euclidean metric on $\mathbb{R}^n$.)
#### Isoperimetric inequality on the sphere
Let $\sigma_n(A)$ denotes the normalized area of $A$ on $n$ dimensional sphere $S^n$. That is $\sigma_n(A)\coloneqq\frac{\operatorname{Area}(A)}{\operatorname{Area}(S^n)}$.
Let $\epsilon>0$. Then for any subset $A\subset S^n$, given the area $\sigma_n(A)$, the spherical caps minimize the volume of the $\epsilon$-neighborhood of $A$.
> The above two inequalities is not proved in the Book _High-dimensional probability_. But you can find it in the Appendix C of Gromov's book _Metric Structures for Riemannian and Non-Riemannian Spaces_.
To continue prove the theorem, we use sub-Gaussian concentration *(Chapter 3 of _High-dimensional probability_ by Roman Vershynin)* of sphere $\sqrt{n}S^n$.
This will leads to some constant $C>0$ such that the following lemma holds:
#### The "Blow-up" lemma
Let $A$ be a subset of sphere $\sqrt{n}S^n$, and $\sigma$ denotes the normalized area of $A$. Then if $\sigma\geq \frac{1}{2}$, then for every $t\geq 0$,
$$
\sigma(A_t)\geq 1-2\exp(-ct^2)
$$
where $A_t=\{x\in S^n: \operatorname{dist}(x,A)\leq t\}$ and $c$ is some positive constant.
#### Proof of the Levy's concentration theorem
Proof:
Without loss of generality, we can assume that $\eta=1$. Let $M$ denotes the median of $f(X)$.
So $\operatorname{Pr}[|f(X)\leq M|]\geq \frac{1}{2}$, and $\operatorname{Pr}[|f(X)\geq M|]\geq \frac{1}{2}$.
Consider the sub-level set $A\coloneqq \{x\in \sqrt{n}S^n: |f(x)|\leq M\}$.
Since $\operatorname{Pr}[X\in A]\geq \frac{1}{2}$, by the blow-up lemma, we have
$$
\operatorname{Pr}[X\in A_t]\geq 1-2\exp(-ct^2)
$$
And since
$$
\operatorname{Pr}[X\in A_t]\leq \operatorname{Pr}[f(X)\leq M+t]
$$
Combining the above two inequalities, we have
$$
\operatorname{Pr}[f(X)\leq M+t]\geq 1-2\exp(-ct^2)
$$
## Levy's concentration theorem via Levy family
> [!TIP]
>
> This version of Levy's concentration theorem can be found in:
> - [Metric Structures for Riemannian and Non-Riemannian Spaces by M. Gromov](https://www.amazon.com/Structures-Riemannian-Non-Riemannian-Progress-Mathematics/dp/0817638989/ref=tmm_hrd_swatch_0?_encoding=UTF8&dib_tag=se&dib=eyJ2IjoiMSJ9.Tp8dXvGbTj_D53OXtGj_qOdqgCgbP8GKwz4XaA1xA5PGjHj071QN20LucGBJIEps.9xhBE0WNB0cpMfODY5Qbc3gzuqHnRmq6WZI_NnIJTvc&qid=1750973893&sr=8-1)
> - [Metric Measure Geometry by Takashi Shioya](https://arxiv.org/pdf/1410.0428)
### Levy's concentration theorem (Gromov's version)
> The Levy's lemma can also be found in _Metric Structures for Riemannian and Non-Riemannian Spaces_ by M. Gromov. $3\frac{1}{2}.19$ The Levy concentration theory.
#### Theorem $3\frac{1}{2}.19$ Levy concentration theorem:
An arbitrary 1-Lipschitz function $f:S^n\to \mathbb{R}$ concentrates near a single value $a_0\in \mathbb{R}$ as strongly as the distance function does.
That is
$$
\mu\{x\in S^n: |f(x)-a_0|\geq\epsilon\} < \kappa_n(\epsilon)\leq 2\exp(-\frac{(n-1)\epsilon^2}{2})
$$
where
$$
\kappa_n(\epsilon)=\frac{\int_\epsilon^{\frac{\pi}{2}}\cos^{n-1}(t)dt}{\int_0^{\frac{\pi}{2}}\cos^{n-1}(t)dt}
$$
$a_0$ is the **Levy mean** of function $f$, that is the level set of $f^{-1}:\mathbb{R}\to S^n$ divides the sphere into equal halves, characterized by the following equality:
$$
\mu(f^{-1}(-\infty,a_0])\geq \frac{1}{2} \text{ and } \mu(f^{-1}[a_0,\infty))\geq \frac{1}{2}
$$
Hardcore computing may generates the bound but M. Gromov did not make the detailed explanation here.
> Detailed proof by Takashi Shioya.
>
> The central idea is to draw the connection between the given three topological spaces, $S^{2n+1}$, $CP^n$ and $\mathbb{R}$.
First, we need to introduce the following distribution and lemmas/theorems:
**OBSERVATION**
consider the orthogonal projection from $\mathbb{R}^{n+1}$, the space where $S^n$ is embedded, to $\mathbb{R}^k$, we denote the restriction of the projection as $\pi_{n,k}:S^n(\sqrt{n})\to \mathbb{R}^k$. Note that $\pi_{n,k}$ is a 1-Lipschitz function (projection will never increase the distance between two points).
We denote the normalized Riemannian volume measure on $S^n(\sqrt{n})$ as $\sigma^n(\cdot)$, and $\sigma^n(S^n(\sqrt{n}))=1$.
#### Definition of Gaussian measure on $\mathbb{R}^k$
We denote the Gaussian measure on $\mathbb{R}^k$ as $\gamma^k$.
$$
d\gamma^k(x)\coloneqq\frac{1}{\sqrt{2\pi}^k}\exp(-\frac{1}{2}\|x\|^2)dx
$$
$x\in \mathbb{R}^k$, $\|x\|^2=\sum_{i=1}^k x_i^2$ is the Euclidean norm, and $dx$ is the Lebesgue measure on $\mathbb{R}^k$.
Basically, you can consider the Gaussian measure as the normalized Lebesgue measure on $\mathbb{R}^k$ with standard deviation $1$.
#### Maxwell-Boltzmann distribution law
> It is such a wonderful fact for me, that the projection of $n+1$ dimensional sphere with radius $\sqrt{n}$ to $\mathbb{R}^k$ is a Gaussian distribution as $n\to \infty$.
For any natural number $k$,
$$
\frac{d(\pi_{n,k})_*\sigma^n(x)}{dx}\to \frac{d\gamma^k(x)}{dx}
$$
where $(\pi_{n,k})_*\sigma^n$ is the push-forward measure of $\sigma^n$ by $\pi_{n,k}$.
In other words,
$$
(\pi_{n,k})_*\sigma^n\to \gamma^k\text{ weakly as }n\to \infty
$$
<details>
<summary>Proof</summary>
We denote the $n$ dimensional volume measure on $\mathbb{R}^k$ as $\operatorname{vol}_k$.
Observe that $\pi_{n,k}^{-1}(x),x\in \mathbb{R}^k$ is isometric to $S^{n-k}(\sqrt{n-\|x\|^2})$, that is, for any $x\in \mathbb{R}^k$, $\pi_{n,k}^{-1}(x)$ is a sphere with radius $\sqrt{n-\|x\|^2}$ (by the definition of $\pi_{n,k}$).
So,
$$
\begin{aligned}
\frac{d(\pi_{n,k})_*\sigma^n(x)}{dx}&=\frac{\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(x))}{\operatorname{vol}_k(S^n(\sqrt{n}))}\\
&=\frac{(n-\|x\|^2)^{\frac{n-k}{2}}}{\int_{\|x\|\leq \sqrt{n}}(n-\|x\|^2)^{\frac{n-k}{2}}dx}\\
\end{aligned}
$$
as $n\to \infty$.
note that $\lim_{n\to \infty}{(1-\frac{a}{n})^n}=e^{-a}$ for any $a>0$.
$(n-\|x\|^2)^{\frac{n-k}{2}}=\left(n(1-\frac{\|x\|^2}{n})\right)^{\frac{n-k}{2}}\to n^{\frac{n-k}{2}}\exp(-\frac{\|x\|^2}{2})$
So
$$
\begin{aligned}
\frac{(n-\|x\|^2)^{\frac{n-k}{2}}}{\int_{\|x\|\leq \sqrt{n}}(n-\|x\|^2)^{\frac{n-k}{2}}dx}&=\frac{e^{-\frac{\|x\|^2}{2}}}{\int_{x\in \mathbb{R}^k}e^{-\frac{\|x\|^2}{2}}dx}\\
&=\frac{1}{(2\pi)^{\frac{k}{2}}}e^{-\frac{\|x\|^2}{2}}\\
&=\frac{d\gamma^k(x)}{dx}
\end{aligned}
$$
QED
</details>
#### Proof of the Levy's concentration theorem via the Maxwell-Boltzmann distribution law
We use the Maxwell-Boltzmann distribution law and Levy's isoperimetric inequality to prove the Levy's concentration theorem.
The goal is the same as the Gromov's version, first we bound the probability of the sub-level set of $f$ by the $\kappa_n(\epsilon)$ function by Levy's isoperimetric inequality. Then we claim that the $\kappa_n(\epsilon)$ function is bounded by the Gaussian distribution.
Note, this section is not rigorous enough in sense of mathematics and the author should add sections about Levy family and observable diameter to make the proof more rigorous and understandable.
<details>
<summary>Proof</summary>
Let $f:S^n\to \mathbb{R}$ be a 1-Lipschitz function.
Consider the two sets of points on the sphere $S^n$ with radius $\sqrt{n}$:
$$
\Omega_+=\{x\in S^n: f(x)\leq a_0-\epsilon\}, \Omega_-=\{x\in S^n: f(x)\geq a_0+\epsilon\}
$$
Note that $\Omega_+\cup \Omega_-$ is the whole sphere $S^n(\sqrt{n})$.
By the Levy's isoperimetric inequality, we have
$$
\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\epsilon))\leq \operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\Omega_+))+\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\Omega_-))
$$
We define $\kappa_n(\epsilon)$ as the following:
$$
\kappa_n(\epsilon)=\frac{\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\epsilon))}{\operatorname{vol}_k(S^n(\sqrt{n}))}=\frac{\int_\epsilon^{\frac{\pi}{2}}\cos^{n-1}(t)dt}{\int_0^{\frac{\pi}{2}}\cos^{n-1}(t)dt}
$$
By the Levy's isoperimetric inequality, and the Maxwell-Boltzmann distribution law, we have
$$
\mu\{x\in S^n: |f(x)-a_0|\geq\epsilon\} < \kappa_n(\epsilon)\leq 2\exp(-\frac{(n-1)\epsilon^2}{2})
$$
</details>
## Levy's Isoperimetric inequality
> This section is from the Appendix $C_+$ of Gromov's book _Metric Structures for Riemannian and Non-Riemannian Spaces_.
Not very edible for undergraduates.
## Differential Geometry
> This section is designed for stupids like me skipping too much essential materials in the book.
> This part might be extended to a separate note, let's check how far we can go from this part.
>
> References:
>
> - [Introduction to Smooth Manifolds by John M. Lee]
>
> - [Riemannian Geometry by John M. Lee](https://www.amazon.com/Introduction-Riemannian-Manifolds-Graduate-Mathematics/dp/3319917544?dib=eyJ2IjoiMSJ9.88u0uIXulwPpi3IjFn9EdOviJvyuse9V5K5wZxQEd6Rto5sCIowzEJSstE0JtQDW.QeajvjQEbsDmnEMfPzaKrfVR9F5BtWE8wFscYjCAR24&dib_tag=se&keywords=riemannian+manifold+by+john+m+lee&qid=1753238983&sr=8-1)
### Manifold
> Unexpectedly, a good definition of the manifold is defined in the topology I.
>
> Check section 36. This topic extends to a wonderful chapter 8 in the book where you can hardly understand chapter 2.
#### Definition of m-manifold
An $m$-manifold is a [Hausdorff space](../../Math4201/Math4201_L9#hausdorff-space) $X$ with a **countable basis** (second countable) such that each point of $x$ of $X$ has a neighborhood [homeomorphic](../../Math4201/Math4201_L10#definition-of-homeomorphism) to an open subset of $\mathbb{R}^m$.
<details>
<summary>Example of second countable space</summary>
Let $X=\mathbb{R}$ and $\mathcal{B}=\{(a,b)|a,b\in \mathbb{R},a<b\}$ (collection of all open intervals with rational endpoints).
Since the rational numbers are countable, so $\mathcal{B}$ is countable.
So $\mathbb{R}$ is second countable.
Likewise, $\mathbb{R}^n$ is also second countable.
</details>
<details>
<summary>Example of manifold</summary>
1-manifold is a curve and 2-manifold is a surface.
</details>
#### Theorem of imbedded space
If $X$ is a compact $m$-manifold, then $X$ can be imbedded in $\mathbb{R}^n$ for some $n$.
This theorem might save you from imagining abstract structures back to real dimension. Good news, at least you stay in some real numbers.
### Smooth manifolds and Lie groups
> This section is waiting for the completion of book Introduction to Smooth Manifolds by John M. Lee.
#### Partial derivatives
Let $U\subseteq \mathbb{R}^n$ and $f:U\to \mathbb{R}^n$ be a map.
For any $a=(a_1,\cdots,a_n)\in U$, $j\in \{1,\cdots,n\}$, the $j$-th partial derivative of $F$ at $a$ is defined as
$$
\begin{aligned}
\frac{\partial f}{\partial x_j}(a)&=\lim_{h\to 0}\frac{f(a_1,\cdots,a_j+h,\cdots,a_n)-f(a_1,\cdots,a_j,\cdots,a_n)}{h} \\
&=\lim_{h\to 0}\frac{f(a+he_j)-f(a)}{h}
\end{aligned}
$$
#### Continuously differentiable maps
Let $U\subseteq \mathbb{R}^n$ and $f:U\to \mathbb{R}^n$ be a map.
If for any $j\in \{1,\cdots,n\}$, the $j$-th partial derivative of $f$ is continuous at $a$, then $f$ is continuously differentiable at $a$.
If $\forall a\in U$, $\frac{\partial f}{\partial x_j}$ exists and is continuous at $a$, then $f$ is continuously differentiable on $U$. or $C^1$ map. (Note that $C^0$ map is just a continuous map.)
#### Smooth maps
A function $f:U\to \mathbb{R}^n$ is smooth if it is of class $C^k$ for every $k\geq 0$ on $U$. Such function is called a diffeomorphism if it is also a **bijection** and its **inverse is also smooth**.
#### Charts
Let $M$ be a smooth manifold. A **chart** is a pair $(U,\varphi)$ where $U\subseteq M$ is an open subset and $\varphi:U\to \hat{U}\subseteq \mathbb{R}^n$ is a homeomorphism (a continuous bijection map and its inverse is also continuous).
If $p\in U$ and $\varphi(p)=0$, then we say that $p$ is the origin of the chart $(U,\varphi)$.
For $p\in U$, we note that the continuous function $\varphi(p)=(x_1(p),\cdots,x_n(p))$ gives a vector in $\mathbb{R}^n$. The $(x_1(p),\cdots,x_n(p))$ is called the **local coordinates** of $p$ in the chart $(U,\varphi)$.
#### Atlas
Let $M$ be a smooth manifold. An **atlas** is a collection of charts $\mathcal{A}=\{(U_\alpha,\phi_\alpha)\}_{\alpha\in I}$ such that $M=\bigcup_{\alpha\in I} U_\alpha$.
An atlas is said to be **smooth** if the transition maps $\phi_\alpha\circ \phi_\beta^{-1}:\phi_\beta(U_\alpha\cap U_\beta)\to \phi_\alpha(U_\alpha\cap U_\beta)$ are smooth for all $\alpha, \beta\in I$.
#### Smooth manifold
A smooth manifold is a pair $(M,\mathcal{A})$ where $M$ is a topological manifold and $\mathcal{A}$ is a smooth atlas.
#### Fundamental group
A **fundamental group** of a point $p$ in a topological space $X$ is the group of all paths (continuous map $f:I\to X$, $I=[0,1]\subseteq \mathbb{R}$) from $p$ to $p$.
- Product defined as composition of paths.
- Identity element is the constant path from $p$ to $p$.
- Inverse is the reverse path.
#### smooth local coordinate representations
If $M$ is a smooth manifold, then any chart $(U,\varphi)$ contained in the given maximal smooth atlas is called a **smooth chart**, and the map $\varphi$ is called a **smooth coordinate map** because it gives a coordinate
#### Lie group
Lie group is a group (satisfying group axioms: closure, associativity, identity, inverses) that is also a smooth manifold. with the operator $m:G\times G\to G$, and the inverse operation $i:G\to G$ that are both smooth.
In short, a Lie group is a group that is also a smooth manifold with map $G\times G\to G$ given by $(g,h)\mapsto gh^-1$ that is smooth.
<details>
<summary>Example of Lie group</summary>
The general linear group $GL(n,\mathbb{R})$ is the group of all $n\times n$ invertible matrices over $\mathbb{R}$.
This is a Lie group since
1. Multiplication is a smooth map $GL(n,\mathbb{R})\times GL(n,\mathbb{R})\to GL(n,\mathbb{R})$ since it is a polynomial map.
2. Inverse is a smooth map $GL(n,\mathbb{R})\to GL(n,\mathbb{R})$ by cramer's rule.
---
If $G$ is a Lie group, then any open subgroup (with subgroup topology and open set in $G$) $H$ of $G$ is also a Lie group.
</details>
#### Translation map on Lie group
If $G$ is a Lie group, then the translation map $L_g:G\to G$ given by $L_g(h)=gh$ and $R_g:G\to G$ given by $R_g(h)=hg$ are both smooth and are diffeomorphisms on $G$.
#### Derivation and tangent vectors
The directional derivative of a geometric tangent vector $v_a\in \mathbb{R}^n_a$ yields a map $D_v\vert_a:C^\infty(\mathbb{R}^n)\to \mathbb{R}$ given by the formula
$$
D_v\vert_a(f)=D_v f(a)=\frac{d}{dt}\bigg\vert_{t=0}f(a+tv_a)
$$
Note that this is a linear over $\mathbb{R}$, and satisfies the product rule.
$$
D_v\vert_a(f\cdot g)=f(a)D_v\vert_a(g)+g(a)D_v\vert_a(f)
$$
We can generalize this representation to the following definition:
If $a$ is a point of $\mathbb{R}^n$, then a **derivation at $a$** is a linear map $w:C^\infty(\mathbb{R}^n)\to \mathbb{R}$ such that it is linear over $\mathbb{R}$ and satisfies the product rule.
$$
w(f\cdot g)=w(f)\cdot g(a)+f(a)\cdot w(g)
$$
Let $T_a\mathbb{R}^n$ denote the set of all derivations of $C^\infty(\mathbb{R}^n)$ at $a$. So $T_a\mathbb{R}^n$ is a vector space over $\mathbb{R}$.
$$
(w_1+w_2)(f)=w_1(f)+w_2(f),\quad (cw)(f)=c(w(f))
$$
Some key properties are given below and check the proof in the book for details.
1. If $f$ is a constant function, then $w(f)=0$.
2. If $f(a)=g(a)=0$, then $w(f\cdot g)=0$.
3. For each geometric tangent vector $v_a\in \mathbb{R}^n_a$, the map $D_v\vert_a:C^\infty(\mathbb{R}^n)\to \mathbb{R}$ is a derivation at $a$.
4. The map $v_a\mapsto D_v\vert_a$ is an isomorphism of vector spaces from $\mathbb{R}^n_a$ to $T_a\mathbb{R}^n$.
#### Tangent vector on Manifolds
Let $M$ be a smooth manifold. Let $p\in M$. A **tangent vector to $M$ at $p$** is a derivation at $p$ if it satisfies:
$$
v(f\cdot g)=f(p)vg+g(p)vf\prod \text{ for all } f,g\in C^\infty(M)
$$
The set of all derivations of $C^\infty(M)$ at $p$ is denoted by $T_pM$ is called tangent space to $M$ at $p$. An element of $T_pM$ is called a tangent vector to $M$ at $p$.
#### Tangent bundle
We define the tangent bundle of $M$ as the disjoint union of all the tangent spaces:
$$
TM=\bigsqcup_{p\in M} T_pM
$$
We write the element in $TM$ as pair $(p,v)$ where $p\in M$ and $v\in T_pM$.
The tangent bundle comes with a natural projection map $\pi:TM\to M$ given by $\pi(p,v)=p$.
#### Section of map
If $\pi:M\to N$ is any continuous map, a **section of $\pi$** is a continuous right inverse of $\pi$. For example $\sigma:N\to M$ is a section of $\pi$ if $\sigma\circ \pi=Id_N$.
#### Vector field
A vector field on $M$ is a section of the map $\pi:TM\to M$.
More concretely, a vector field is a continuous map $X:M\to TM$, usually written $p\mapsto X_p$, with property that
$$
\pi\circ X=Id_M
$$
> That is a map from element on the manifold to the tangent space of the manifold.
### Riemannian manifolds and geometry
#### Riemannian metric
A Riemannian metric is a smooth assignment of an inner product to each tangent space $T_pM$ of the manifold.
More formally, let $M$ be a smooth manifold. A **Riemannian metric** on $M$ is a smooth covariant 2-tensor field $g\in \mathcal{T}^2(M)$ whose value $g_p$ at each $p\in M$ is an inner product on $T_p M$.
Thus $g$ is a symmetric 2-tensor field that is positive definite in the sense that $g_p(v,v)\geq 0$ for each $p\in M$ and each $v\in T_p M$, with equality if and only if $v=0$.
Riemannian metric exists in great abundance.
A good news for smooth manifold is that every smooth manifold admits a Riemannian metric.
<details>
<summary> Example of Riemannian metrics</summary>
An example of Riemannian metric is the Euclidean metric, the bilinear form of $d(p,q)=\|p-q\|_2$ on $\mathbb{R}^n$.
More formally, the Riemannian metric $\overline{g}$ on $\mathbb{R}^n$ at each $x\in \mathbb{R}^n$ , for $v,w\in T_x \mathbb{R}^n$ with stardard coordinates $(x^1,\ldots,x^n)$ as $v=\sum_{i=1}^n v_i \partial_x^i$ and $w=\sum_{
</details>
#### Riemannian manifolds
A Riemannian manifold is a smooth manifold equipped with a **Riemannian metric**, which is a smooth assignment of an inner product to each tangent space $T_pM$ of the manifold.
More formally, a **Riemannian manifold** is a pair $(M,g)$, where $M$ is a smooth manifold and $g$ is a specific choice of Riemannian metric on $M$.
An example of Riemannian manifold is the sphere $\mathbb{C}P^n$.
### Notion of Connection
A connection is a way to define the directional derivative of a vector field along a curve on a Riemannian manifold.
For every $p\in M$, where $M$ denote the manifold, suppose $M=\mathbb{R}^n$, then let $X=(f_1,\cdots,f_n)$ be a vector field on $M$. The directional derivative of $X$ along the point $p$ is defined as
$$
D_VX=\lim_{h\to 0}\frac{X(p+h)-X(p)}{h}
$$
### Notion of Curvatures
> [!NOTE]
>
> Geometrically, the curvature of the manifold is radius of the tangent sphere of the manifold.
#### Nabla notation and Levi-Civita connection
#### Fundamental theorem of Riemannian geometry
Let $(M,g)$ be a Riemannian or pseudo-Riemannian manifold (with or without boundary). There exist sa unique connection $\nabla$ on $TM$ that is compatible with $g$ and symmetric. It is called the **Levi-Civita** connection of $g$ (or also, when $g$ is a positive definite, the Riemannian connection).
#### Ricci curvature

View File

@@ -1,35 +0,0 @@
# Math 401, Fall 2025: Thesis notes, R3, Page's lemma
> Progress: 0/4=0% (denominator and enumerator may change)
The page's lemma is a fundamental result in quantum information theory that provides a lower bound on the probability of error in a quantum channel.
## Statement
Choosing a random pure quantum state $\rho$ from the bi-partite pure state space $\mathcal{H}_A\otimes\mathcal{H}_B$ with the uniform distribution, the expected entropy of the reduced state $\rho_A$ is:
$$
\mathbb{E}[H(\rho_A)]\geq \ln d_A -\frac{1}{2\ln 2} \frac{d_A}{d_B}
$$
## Page's conjecture
A quantum system $AB$ with the Hilbert space dimension $mn$ in a pure state $\rho_{AB}$ has entropy $0$ but the entropy of the reduced state $\rho_A$, assume $m\leq n$, then entropy of the two subsystem $A$ and $B$ is greater than $0$.
unless $A$ and $B$ are separable.
In the original paper, the entropy of the average state taken under the unitary invariant Haar measure is:
$$
S_{m,n}=\sum_{k=n+1}^{mn}\frac{1}{k}-\frac{m-1}{2n}\simeq \ln m-\frac{m}{2n}
$$
## References to begin with
- [The random Matrix Theory of the Classical Compact groups](https://case.edu/artsci/math/esmeckes/Haar_book.pdf)
- [Page's conjecture](https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.71.1291)
- [Page's conjecture simple proof](https://journals.aps.org/pre/pdf/10.1103/PhysRevE.52.5653)
- [Geometry of quantum states an introduction to quantum entanglement second edition](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C)

View File

@@ -1,17 +0,0 @@
# Math 401, Fall 2025: Thesis notes, R4, Superdense coding and Quantum error correcting codes
> Progress: 0/NaN=NaN% (denominator and enumerator may change)
This part may not be a part of "mathematical" research. But that's what I initially begin with.
## Superdense coding
> [!TIP]
>
> A helpful resource is [The Functional Analysis of Quantum Information Theory](https://arxiv.org/pdf/1410.7188) Section 2.2
>
> Or another way in quantum computing [Quantum Computing and Quantum Information](https://www.cambridge.org/highereducation/books/quantum-computation-and-quantum-information/01E10196D0A682A6AEFFEA52D53BE9AE#overview) Section 2.3
## Quantum error correcting codes
This part is intentionally left blank and may be filled near the end of the semester, by assignments given in CSE5313.

View File

@@ -1,20 +0,0 @@
# Math 401, Fall 2025: Thesis notes, S1, Complex projective space.
> [!CAUTION]
>
> In this section, without explicitly stated, all dimensions are in the complex field.
A complex projective space is a space that is the set of all lines through the origin in a complex vector space.
Described by that nature, there exists a natural definition of the complex projective space given as follows:
$$
\mathbb{C}P^n=\frac{\mathbb{C}^{n+1}\setminus\{0\}}{\sim}
$$
By this nature of ray-like properties, we can also describe the complex projective space as follows (in the math of QT, lecture 5)
$$
\mathbb{C}P^n=\left\{z=(z_0,z_1,\cdots,z_n)\in\mathbb{C}^{n+1}:|z_1|^2+\cdots+|z_n|^2=1\right\}/\sim
$$

View File

@@ -1,31 +0,0 @@
# Math 401, Fall 2025: Thesis notes, S2, Majorana stellar representation of quantum states
## Majorana stellar representation of quantum states
> [!TIP]
>
> A helpful resource is [Geometry of Quantum states](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C) Section 4.4 and Chapter 7.
Vectors in $\mathbb{C}^{n+1}$ can be represented by a set of $n$ degree polynomials.
$$
\vec{Z}=(Z_1,\cdots,Z_n)\sim w(z)=Z_0+Z_1z+\cdots+Z_nz^n
$$
If $Z_0\neq 0$, then we can rescale the polynomial to make $Z_0=1$.
Therefore, points in $\mathbb{C}P^{n}$ will be one-to-one corresponding to the set of $n$ degree polynomials with $n$ complex roots.
$$
Z_0+Z_1z+\cdots+Z_nz^n=0=Z_0(z-z_1)(z-z_2)\cdots(z-z_n)
$$
If $Z_0=0$, then count $\infty$ as root.
Using stereographic projection of each root we can get a unordered collection of $S^2$. Example: $\mathbb{C}P=S^2$, $\mathbb{C}p^2=S^2\times S^2\setminus S_2$ where $S_2$ is symmetric group.
> [!NOTE]
>
> TODO: Check more definition from different area of mathematics (algebraic geometry, complex analysis, etc.) of the Majorana stellar representation of quantum states.
>
> Read Chapter 5 and 6 of [Geometry of Quantum states](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C) for more details.

View File

@@ -1,2 +0,0 @@
# Math 401, Fall 2025: Thesis notes, S3, Coherent states and POVMs

View File

@@ -1,288 +0,0 @@
# Math 401, Fall 2025: Thesis notes, S4, Bargmann space
## Bargmann space (original)
Also known as Segal-Bargmann space or Bargmann-Fock space.
It is the space of [holomorphic functions](../../Math416/Math416_L3#definition-28-holomorphic-functions) that is square-integrable over the complex plane.
> Section belows use [Remarks on a Hilbert Space of Analytic Functions](https://www.jstor.org/stable/71180) as the reference.
A family of Hilbert spaces, $\mathfrak{F}_n(n=1,2,3,\cdots)$, is defined as follows:
The element of $\mathfrak{F}_n$ are [entire](../../Math416/Math416_L13#definition-711) [analytic functions](../../Math416/Math416_L9#definition-analytic) in complex Euclidean space $\mathbb{C}^n$. $f:\mathbb{C}^n\to \mathbb{C}\in \mathfrak{F}_n$
Let $f,g\in \mathfrak{F}_n$. The inner product is defined by
$$
\langle f,g\rangle=\int_{\mathbb{C}^n} \overline{f(z)}g(z) d\mu_n(z)
$$
Let $z_k=x_k+iy_k$ be the complex coordinates of $z\in \mathbb{C}^n$.
The measure $\mu_n$ is the defined by
$$
d\mu_n(z)=\pi^{-n}\exp(-\sum_{i=1}^n |z_i|^2)\prod_{k=1}^n dx_k dy_k
$$
<details>
<summary>Example</summary>
For $n=2$,
$$
\mathfrak{F}_2=\text{ space of entire analytic functions on } \mathbb{C}^2\to \mathbb{C}
$$
$$
\langle f,g\rangle=\int_{\mathbb{C}^2} \overline{f(z)}g(z) d\mu(z),z=(z_1,z_2)
$$
$$
d\mu_2(z)=\frac{1}{\pi^2}\exp(-|z|^2)dx_1 dy_1 dx_2 dy_2
$$
</details>
so that $f$ belongs to $\mathfrak{F}_n$ if and only if $\langle f,f\rangle<\infty$.
This is absolutely terrible early texts, we will try to formulate it in a more modern way.
> The section belows are from the lecture notes [Holomorphic method in analysis and mathematical physics](https://arxiv.org/pdf/quant-ph/9912054)
## Complex function spaces
### Holomorphic spaces
Let $U$ be a non-empty open set in $\mathbb{C}^d$. Let $\mathcal{H}(U)$ be the space of holomorphic (or analytic) functions on $U$.
Let $f\in \mathcal{H}(U)$, note that by definition of holomorphic on several complex variables, $f$ is continuous and holomorphic in each variable with the other variables fixed.
Let $\alpha$ be a continuous, strictly positive function on $U$.
$$
\mathcal{H}L^2(U,\alpha)=\left\{F\in \mathcal{H}(U): \int_U |F(z)|^2 \alpha(z) d\mu(z)<\infty\right\},
$$
where $\mu$ is the Lebesgue measure on $\mathbb{C}^d=\mathbb{R}^{2d}$.
#### Theorem of holomorphic spaces
1. For all $z\in U$, there exists a constant $c_z$ such that
$$
|F(z)|^2\le c_z \|F\|^2_{L^2(U,\alpha)}
$$
for all $F\in \mathcal{H}L^2(U,\alpha)$.
2. $\mathcal{H}L^2(U,\alpha)$ is a closed subspace of $L^2(U,\alpha)$, and therefore a Hilbert space.
<details>
<summary>Proof</summary>
First we check part 1.
Let $z=(z_1,z_2,\cdots,z_d)\in U, z_k\in \mathbb{C}$. Let $P_s(z)$ be the "polydisk"of radius $s$ centered at $z$ defined as
$$
P_s(z)=\{v\in \mathbb{C}^d: |v_k-z_k|<s, k=1,2,\cdots,d\}
$$
If $z\in U$, we cha choose $s$ small enough such that $\overline{P_s(z)}\subset U$ so that we can claim that $F(z)=(\pi s^2)^{-d}\int_{P_s(z)}F(v)d\mu(v)$ is well-defined.
If $d=1$. Then by Taylor series at $v=z$, since $F$ is analytic in $U$ we have
$$
F(v)=F(z)+\sum_{k=1}^{\infty}a_n(v-z)^n
$$
Since the series converges uniformly to $F$ on the compact set $\overline{P_s(z)}$, we can interchange the integral and the sum.
Using polar coordinates with origin at $z$, $(v-z)^n=r^n e^{in\theta}$ where $r=|v-z|, \theta=\arg(v-z)$.
For $n\geq 1$, the integral over $P_s(z)$ (open disk) is zero (by Cauchy's theorem).
So,
$$
\begin{aligned}
F(z)&=(\pi s^2)^{-1}\int_{P_s(z)}F(z)+\sum_{k=1}^{\infty}a_n(v-z)^n d\mu(v)\\
&=(\pi s^2)^{-1}F(z)+(\pi s^2)^{-1}\sum_{k=1}^{\infty}a_n\int_{P_s(z)}r^n e^{in\theta} d\mu(v)\\
&=(\pi s^2)^{-1}F(z)
\end{aligned}
$$
For $d>1$, we can use the same argument to show that
Let $\mathbb{I}_{P_s(z)}(v)=\begin{cases}1 & v\in P_s(z) \\0 & v\notin P_s(z)\end{cases}$ be the indicator function of $P_s(z)$.
$$
\begin{aligned}
F(z)&=(\pi s^2)^{-d}\int_{U}\mathbb{I}_{P_s(z)}(v)\frac{1}{\alpha(v)}F(v)\alpha(v) d\mu(v)\\
&=(\pi s^2)^{-d}\langle \mathbb{I}_{P_s(z)}\frac{1}{\alpha},F\rangle_{L^2(U,\alpha)}
\end{aligned}
$$
By definition of inner product.
So $\|F(z)\|^2\leq (\pi s^2)^{-2d}\|\mathbb{I}_{P_s(z)}\frac{1}{\alpha}\|^2_{L^2(U,\alpha)} \|F\|^2_{L^2(U,\alpha)}$.
All the terms are bounded and finite.
For part 2, we need to show that $\forall z\in U$, we can find a neighborhood $V$ of $z$ and a constant $d_z$ such that
$$
|F(z)|^2\leq d_z \|F\|^2_{L^2(U,\alpha)}
$$
Suppose we have a sequence $F_n\in \mathcal{H}L^2(U,\alpha)$ such that $F_n\to F$, $F\in L^2(U,\alpha)$.
Then $F_n$ is a cauchy sequence in $L^2(U,\alpha)$. So,
$$
\sup_{v\in V}|F_n(v)-F_m(v)|\leq \sqrt{d_z}\|F_n-F_m\|_{L^2(U,\alpha)}\to 0\text{ as }n,m\to \infty
$$
So the sequence $F_m$ converges locally uniformly to some limit function which must be $F$ ($\mathbb{C}^d$ is Hausdorff, unique limit point).
Locally uniform limit of holomorphic functions is holomorphic. (Use Morera's Theorem to show that the limit is still holomorphic in each variable.) So the limit function $F$ is actually in $\mathcal{H}L^2(U,\alpha)$, which shows that $\mathcal{H}L^2(U,\alpha)$ is closed.
which shows that $\mathcal{H}L^2(U,\alpha)$ is closed.
</details>
> [!TIP]
>
> [1.] states that point-wise evaluation of $F$ on $U$ is continuous. That is, for each $z\in U$, the map $\varphi: \mathcal{H}L^2(U,\alpha)\to \mathbb{C}$ that takes $F\in \mathcal{H}L^2(U,\alpha)$ to $F(z)$ is a continuous linear functional on $\mathcal{H}L^2(U,\alpha)$. This is false for ordinary non-holomorphic functions, e.g. $L^2$ spaces.
#### Reproducing kernel
Let $\mathcal{H}L^2(U,\alpha)$ be a holomorphic space. The reproducing kernel of $\mathcal{H}L^2(U,\alpha)$ is a function $K:U\times U\to \mathbb{C}$, $K(z,w),z,w\in U$ with the following properties:
1. $K(z,w)$ is holomorphic in $z$ and anti-holomorphic in $w$.
$$
K(w,z)=\overline{K(z,w)}
$$
2. For each fixed $z\in U$, $K(z,w)$ is a square integrable $d\alpha(w)$. For all $F\in \mathcal{H}L^2(U,\alpha)$,
$$
F(z)=\int_U K(z,w)F(w) \alpha(w) dw
$$
3. If $F\in L^2(U,\alpha)$, let $PF$ denote the orthogonal projection of $F$ onto closed subspace $\mathcal{H}L^2(U,\alpha)$. Then
$$
PF(z)=\int_U K(z,w)F(w) \alpha(w) dw
$$
4. For all $z,u\in U$,
$$
\int_U K(z,w)K(w,u) \alpha(w) dw=K(z,u)
$$
5. For all $z\in U$,
$$
|F(z)|^2\leq K(z,z) \|F\|^2_{L^2(U,\alpha)}
$$
<details>
<summary>Proof</summary>
For part 1, By [Riesz Theorem](../../Math429/Math429_L27#theorem-642-riesz-representation-theorem), the linear functional evaluation at $z\in U$ on $\mathcal{H}L^2(U,\alpha)$ can be represented uniquely as inner product with some $\phi_z\in \mathcal{H}L^2(U,\alpha)$.
$$
F(z)=\langle F,\phi_z\rangle_{L^2(U,\alpha)}=\int_U F(w)\overline{\phi_z(w)} \alpha(w) dw
$$
And assume part 2 is true, then we have
$K(z,w)=\overline{\phi_z(w)}$
So part 1 is true.
For part 2, we can use the same argument
$$
\phi_z(w)=\langle \phi_z,\phi_w\rangle_{L^2(U,\alpha)}=\overline{\langle \phi_w,\phi_z\rangle_{L^2(U,\alpha)}}=\overline{\phi_w(z)}
$$
... continue if needed.
</details>
#### Construction of reproducing kernel
Let $\{e_j\}$ be any orthonormal basis of $\mathcal{H}L^2(U,\alpha)$. Then for all $z,w\in U$,
$$
\sum_{j=1}^{\infty} |e_j(z)\overline{e_j(w)}|<\infty
$$
and
$$
K(z,w)=\sum_{j=1}^{\infty} e_j(z)\overline{e_j(w)}
$$
### Bargmann space
The Bargmann spaces are the holomorphic spaces
$$
\mathcal{H}L^2(\mathbb{C}^d,\mu_t)
$$
where
$$
\mu_t(z)=(\pi t)^{-d}\exp(-|z|^2/t)
$$
> For this research, we can tentatively set $t=1$ and $d=2$ for simplicity so that you can continue to read the next section.
#### Reproducing kernel for Bargmann space
For all $d\geq 1$, the reproducing kernel of the space $\mathcal{H}L^2(\mathbb{C}^d,\mu_t)$ is given by
$$
K(z,w)=\exp(z\cdot \overline{w}/t)
$$
where $z\cdot \overline{w}=\sum_{k=1}^d z_k\overline{w_k}$.
This gives the pointwise bounds
$$
|F(z)|^2\leq \exp(\|z\|^2/t) \|F\|^2_{L^2(\mathbb{C}^d,\mu_t)}
$$
For all $F\in \mathcal{H}L^2(\mathbb{C}^d,\mu_t)$, and $z\in \mathbb{C}^d$.
> Proofs are intentionally skipped, you can refer to the lecture notes for details.
#### Lie bracket of vector fields
Let $X,Y$ be two vector fields on a smooth manifold $M$. The Lie bracket of $X$ and $Y$ is an operator $[X,Y]:C^\infty(M)\to C^\infty(M)$ defined by
$$
[X,Y](f)=X(Y(f))-Y(X(f))
$$
This operator is a vector field.
## Complex Manifolds
> This section extends from our previous discussion of smooth manifolds in Math 401, R2.
>
> For this week [10/21/2025], our goal is to understand the Riemann-Roch theorem and its applications.
>
> References:
>
> - [Introduction to Complex Manifolds](https://bookstore.ams.org/gsm-244)
### Riemann-Roch Theorem (Theorem 9.64)
Suppose $M$ is a connected compact Riemann surface of genus $g$, and $L\to M$ is a holomorphic line bundle. Then
$$
\dim \mathcal{O}(M;L)=\deg L+1-g+\dim \mathcal{O}(M;K\otimes L^*)
$$

View File

@@ -1,3 +0,0 @@
export default {
index: "Math 401, Fall 2025: Overview of thesis",
}

View File

@@ -1,33 +0,0 @@
# Math 401, Fall 2025: Overview of thesis
This is a note base on first discussion with Prof. Feres on 2025-09-02
Due to time constraint, our goal for this semester is to extend the study of concentration of measure effects described by Hayden's paper to Majorana **stellar representation of quantum states**.
That is, we want to build connection between the system described by follows:
## Bounding the entropy of the state via Levy's concentration theorem and Page's lemma
Recall that the bipartite quantum states of $\mathcal{P}(A\otimes B)$. Assume $A$ has $\dim A=d_A$, $\dim B=d_B$, the the system is isomorphic to the complex projective space $\mathbb{C}P^{d_A d_B-1}$.
Then over partial trace operations over $B$, we can obtain a mixed quantum state denoted by $S_A$ on the Hilbert space $A$.
Then we measure the von Neumann entropy of $S_A$ to get the entropy of the state.
From the Hayden's work, using analysis of Levy's concentration theorem, and Page's lemma, we can find that the entropy of the state is concentrated around a certain value which is close to maximally entangled state.
---
This project is incomplete due to several critical missing parts that I don't have comprehensive knowledge to fill in.
One goal for this section of study is to fully investigate the missing parts and fill in the gaps. It is irrelevant to any one except me for trivial reasons. But I don't want to speak anything that I don't have a good understanding of.
To achieve this goal, I will set up few side project that continue to investigate the missing parts, and the notes will start with letter `R`, for recollections.
To make these sections self-contained. Some materials will be borrowed from other notes.
## Bounding the entropy of the state via exploring Majorana stellar representation of quantum states
As Professor Feres mentioned, we can further explore the Majorana stellar representation of quantum states to bound the entropy of the state.
The new topics discovered will be noted with letter `S`. for stellar representation.

View File

@@ -1,110 +0,0 @@
# Math 401 Paper 1: Concentration of measure effects in quantum information (Patrick Hayden)
[Concentration of measure effects in quantum information](https://www.ams.org/books/psapm/068/2762144)
A more comprehensive version of this paper is in [Aspect of generic entanglement](https://arxiv.org/pdf/quant-ph/0407049).
## Quantum codes
### Preliminaries
#### Daniel Gottesman's mathematics of quantum error correction
##### Quantum channels
Encoding channel and decoding channel
That is basically two maps that encode and decode the qbits. You can think of them as a quantum channel.
#### Quantum capacity for a quantum channel
The quantum capacity of a quantum channel is governed by the HSW noisy coding theorem, which is the counterpart for the Shannon's noisy coding theorem in quantum information settings.
#### Lloyd-Shor-Devetak theorem
Note, the model of the noisy channel in quantum settings is a map $\eta$: that maps a state $\rho$ to another state $\eta(\rho)$. This should be a CPTP map.
Let $A'\cong A$ and $|\psi\rangle\in A'\otimes A$.
Then $Q(\mathcal{N})\geq H(B)_\sigma-H(A'B)_\sigma$.
where $\sigma=(I_{A'}\otimes \mathcal{N})\circ|\psi\rangle\langle\psi|$.
(above is the official statement in the paper of Patrick Hayden)
That should means that in the limit of many uses, the optimal rate at which A can reliably sent qbits to $B$ ($1/n\log d$) through $\eta$ is given by the regularization of the formula
$$
Q(\eta)=\max_{\phi_{AB}}[-H(B|A)_\sigma]
$$
where $H(B|A)_\sigma$ is the conditional entropy of $B$ given $A$ under the state $\sigma$.
$\phi_{AB}=(I_{A'}\otimes \eta)\circ\omega_{AB}$
(above formula is from the presentation of Patrick Hayden.)
For now we ignore this part if we don't consider the application of the following sections. The detailed explanation will be added later (hopefully very soon).
---
### Surprise in high-dimensional quantum systems
#### Levy's lemma
Given an $\eta$-Lipschitz function $f:S^n\to \mathbb{R}$ with median $M$, the probability that a random $x\in_R S^n$ is further than $\epsilon$ from $M$ is bounded above by $\exp(-\frac{C(n-1)\epsilon^2}{\eta^2})$, for some constant $C>0$.
$$
\operatorname{Pr}[|f(x)-M|>\epsilon]\leq \exp(-\frac{C(n-1)\epsilon^2}{\eta^2})
$$
[Decomposing the statement in detail as side note 3](Math401_P1_3.md)
### Random states and random subspaces
Choose a random pure state $\sigma=|\psi\rangle\langle\psi|$ from $A'\otimes A$.
The expected value of the entropy of entanglement is known and satisfies a concentration inequality.
$$
\mathbb{E}[H(\psi_A)] \geq \log_2(d_A)-\frac{1}{2\ln(2)}\frac{d_A}{d_B}
$$
[Decomposing the statement in detail as side note 2](Math401_P1_2.md)
From the Levy's lemma, we have
If we define $\beta=\frac{d_A}{\log_2(d_B)}$, then we have
$$
\operatorname{Pr}[H(\psi_A) < \log_2(d_A)-\alpha-\beta] \leq \exp\left(-\frac{(d_Ad_B-1)C\alpha^2}{(\log_2(d_A))^2}\right)
$$
where $C$ is a small constant and $d_B\geq d_A\geq 3$.
> Noted in [Aspect of generic entanglement](https://arxiv.org/pdf/quant-ph/0407049) $C_3=(8\pi^2\ln(2))^{-1}$.
#### ebits and qbits
### Superdense coding of quantum states
It is a procedure defined as follows:
Suppose $A$ and $B$ share a Bell state $|\Phi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)$, where $A$ holds the first part and $B$ holds the second part.
$A$ wish to send 2 classical bits to $B$.
$A$ performs one of four Pauli unitaries on the combined state of entangled qubits $\otimes$ one qubit. Then $A$ sends the resulting one qubit to $B$.
This operation extends the initial one entangled qubit to a system of one of four orthogonal Bell states.
$B$ performs a measurement on the combined state of the one qubit and the entangled qubits he holds.
$B$ decodes the result and obtains the 2 classical bits sent by $A$.
### Consequences for mixed state entanglement measures
#### Quantum mutual information
### Multipartite entanglement
> The role of the paper in Physics can be found in (15.86) on book Geometry of Quantum states.

View File

@@ -1,154 +0,0 @@
# Math 401 Paper 1, Side note 1: Quantum information theory and Measure concentration
## Typicality
> The idea of typicality in high-dimensions is very important topic in understanding this paper and taking it to the next level of detail under language of mathematics. I'm trying to comprehend these material and write down my understanding in this note.
Let $X$ be the alphabet of our source of information.
Let $x^n=x_1,x_2,\cdots,x_n$ be a sequence with $x_i\in X$.
We say that $x^n$ is $\epsilon$-typical with respect to $p(x)$ if
- For all $a\in X$ with $p(a)>0$, we have
$$
\|\frac{1}{n}N(a|x^n)-p(a)|\leq \frac{\epsilon}{\|X\|}
$$
- For all $a\in X$ with $p(a)=0$, we have
$$
N(a|x^n)=0
$$
Here $N(a|x^n)$ is the number of times $a$ appears in $x^n$. That's basically saying that:
1. The difference between **the probability of $a$ appearing in $x^n$** and the **probability of $a$ appearing in the source of information $p(a)$** should be within $\epsilon$ divided by the size of the alphabet $X$ of the source of information.
2. The probability of $a$ not appearing in $x^n$ should be 0.
Here are two easy propositions that can be proved:
For $\epsilon>0$, the probability of a sequence being $\epsilon$-typical goes to 1 as $n$ goes to infinity.
If $x^n$ is $\epsilon$-typical, then the probability of $x^n$ is produced is $2^{-n[H(X)+\epsilon]}\leq p(x^n)\leq 2^{-n[H(X)-\epsilon]}$.
The number of $\epsilon$-typical sequences is at least $2^{n[H(X)+\epsilon]}$.
Recall that $H(X)=-\sum_{a\in X}p(a)\log_2 p(a)$ is the entropy of the source of information.
## Shannon theory in Quantum information theory
Shannon theory provides a way to quantify the amount of information in a message.
Practically speaking:
- A holy grail for error-correcting codes
- Conceptually speaking:
- An operationally-motivated way of thinking about correlations
- Whats missing (for a quantum mechanic)?
- Features from linear structure:
- Entanglement and non-orthogonality
## Partial trace and purification
### Partial trace
Recall that the bipartite state of a quantum system is a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
#### Definition of partial trace for arbitrary linear operators
Let $T$ be a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
An operator $T$ on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$ can be written as (by the definition of [tensor product of linear operators](https://notenextra.trance-0.com/Math401/Math401_T2#tensor-products-of-linear-operators))
$$
T=\sum_{i=1}^n a_i A_i\otimes B_i
$$
where $A_i$ is a linear operator on $\mathscr{A}$ and $B_i$ is a linear operator on $\mathscr{B}$.
The $\mathscr{B}$-partial trace of $T$ ($\operatorname{Tr}_{\mathscr{B}}(T):\mathcal{L}(\mathscr{A}\otimes \mathscr{B})\to \mathcal{L}(\mathscr{A})$) is the linear operator on $\mathscr{A}$ defined by
$$
\operatorname{Tr}_{\mathscr{B}}(T)=\sum_{i=1}^n a_i \operatorname{Tr}(B_i) A_i
$$
#### Partial trace for density operators
Let $\rho$ be a density operator in $\mathscr{H}_1\otimes\mathscr{H}_2$, the partial trace of $\rho$ over $\mathscr{H}_2$ is the density operator in $\mathscr{H}_1$ (reduced density operator for the subsystem $\mathscr{H}_1$) given by:
$$
\rho_1\coloneqq\operatorname{Tr}_2(\rho)
$$
<details>
<summary>Examples</summary>
Let $\rho=\frac{1}{\sqrt{2}}(|01\rangle+|10\rangle)$ be a density operator on $\mathscr{H}=\mathbb{C}^2\otimes \mathbb{C}^2$.
Expand the expression of $\rho$ in the basis of $\mathbb{C}^2\otimes\mathbb{C}^2$ using linear combination of basis vectors:
$$
\rho=\frac{1}{2}(|01\rangle\langle 01|+|01\rangle\langle 10|+|10\rangle\langle 01|+|10\rangle\langle 10|)
$$
Note $\operatorname{Tr}_2(|ab\rangle\langle cd|)=|a\rangle\langle c|\cdot \langle b|d\rangle$.
Then the reduced density operator of the subsystem $\mathbb{C}^2$ in first qubit is, note the $\langle 0|0\rangle=\langle 1|1\rangle=1$ and $\langle 0|1\rangle=\langle 1|0\rangle=0$:
$$
\begin{aligned}
\rho_1&=\operatorname{Tr}_2(\rho)\\
&=\frac{1}{2}(\langle 1|1\rangle |0\rangle\langle 0|+\langle 0|1\rangle |0\rangle\langle 1|+\langle 1|0\rangle |1\rangle\langle 0|+\langle 0|0\rangle |1\rangle\langle 1|)\\
&=\frac{1}{2}(|0\rangle\langle 0|+|1\rangle\langle 1|)\\
&=\frac{1}{2}I
\end{aligned}
$$
is a mixed state.
</details>
### Purification
Let $\rho$ be any [state](https://notenextra.trance-0.com/Math401/Math401_T6#pure-states) (may not be pure) on the finite dimensional Hilbert space $\mathscr{H}$. then there exists a unit vector $w\in \mathscr{H}\otimes \mathscr{H}$ such that $\rho=\operatorname{Tr}_2(|w\rangle\langle w|)$ is a pure state.
<details>
<summary>Proof</summary>
Let $(u_1,u_2,\cdots,u_n)$ be an orthonormal basis of $\mathscr{H}$ consisting of eigenvectors of $\rho$ for the eigenvalues $p_1,p_2,\cdots,p_n$. As $\rho$ is a states, $p_i\geq 0$ for all $i$ and $\sum_{i=1}^n p_i=1$.
We can write $\rho$ as
$$
\rho=\sum_{i=1}^n p_i |u_i\rangle\langle u_i|
$$
Let $w=\sum_{i=1}^n \sqrt{p_i} u_i\otimes u_i$, note that $w$ is a unit vector (pure state). Then
$$
\begin{aligned}
\operatorname{Tr}_2(|w\rangle\langle w|)&=\operatorname{Tr}_2(\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} |u_i\otimes u_i\rangle \langle u_j\otimes u_j|)\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \operatorname{Tr}_2(|u_i\otimes u_i\rangle \langle u_j\otimes u_j|)\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \langle u_i|u_j\rangle |u_i\rangle\langle u_i|\\
&=\sum_{i=1}^n \sum_{j=1}^n \sqrt{p_ip_j} \delta_{ij} |u_i\rangle\langle u_i|\\
&=\sum_{i=1}^n p_i |u_i\rangle\langle u_i|\\
&=\rho
\end{aligned}
$$
is a pure state.
QED
</details>
## Drawing the connection between the space $S^{2n+1}$, $CP^n$, and $\mathbb{R}$
A pure quantum state of size $N$ can be identified with a **Hopf circle** on the sphere $S^{2N-1}$.
A random pure state $|\psi\rangle$ of a bipartite $N\times K$ system such that $K\geq N\geq 3$.
The partial trace of such system produces a mixed state $\rho(\psi)=\operatorname{Tr}_K(|\psi\rangle\langle \psi|)$, with induced measure $\mu_K$. When $K=N$, the induced measure $\mu_K$ is the Hilbert-Schmidt measure.
Consider the function $f:S^{2N-1}\to \mathbb{R}$ defined by $f(x)=S(\rho(\psi))$, where $S(\cdot)$ is the von Neumann entropy. The Lipschitz constant of $f$ is $\sim \ln N$.

View File

@@ -1,101 +0,0 @@
# Math 401 Paper 1, Side note 2: Page's lemma
The page's lemma is a fundamental result in quantum information theory that provides a lower bound on the probability of error in a quantum channel.
## Basic definitions
### $SO(n)$
The special orthogonal group $SO(n)$ is the set of all **distance preserving** linear transformations on $\mathbb{R}^n$.
It is the group of all $n\times n$ orthogonal matrices ($A^T A=I_n$) on $\mathbb{R}^n$ with determinant $1$.
$$
SO(n)=\{A\in \mathbb{R}^{n\times n}: A^T A=I_n, \det(A)=1\}
$$
<details>
<summary>Extensions</summary>
In [The random Matrix Theory of the Classical Compact groups](https://case.edu/artsci/math/esmeckes/Haar_book.pdf), the author gives a more general definition of the Haar measure on the compact group $SO(n)$,
$O(n)$ (the group of all $n\times n$ **orthogonal matrices** over $\mathbb{R}$),
$$
O(n)=\{A\in \mathbb{R}^{n\times n}: AA^T=A^T A=I_n\}
$$
$U(n)$ (the group of all $n\times n$ **unitary matrices** over $\mathbb{C}$),
$$
U(n)=\{A\in \mathbb{C}^{n\times n}: A^*A=AA^*=I_n\}
$$
Recall that $A^*$ is the complex conjugate transpose of $A$.
$SU(n)$ (the group of all $n\times n$ unitary matrices over $\mathbb{C}$ with determinant $1$),
$$
SU(n)=\{A\in \mathbb{C}^{n\times n}: A^*A=AA^*=I_n, \det(A)=1\}
$$
$Sp(2n)$ (the group of all $2n\times 2n$ symplectic matrices over $\mathbb{C}$),
$$
Sp(2n)=\{U\in U(2n): U^T J U=UJU^T=J\}
$$
where $J=\begin{pmatrix}
0 & I_n \\
-I_n & 0
\end{pmatrix}$ is the standard symplectic matrix.
</details>
### Haar measure
Let $(SO(n), \| \cdot \|, \mu)$ be a metric measure space where $\| \cdot \|$ is the [Hilbert-Schmidt norm](https://notenextra.trance-0.com/Math401/Math401_T2#definition-of-hilbert-schmidt-norm) and $\mu$ is the measure function.
The Haar measure on $SO(n)$ is the unique probability measure that is invariant under the action of $SO(n)$ on itself.
That is also called _translation-invariant_.
That is, fixing $B\in SO(n)$, $\forall A\in SO(n)$, $\mu(A\cdot B)=\mu(B\cdot A)=\mu(B)$.
The Haar measure is the unique probability measure that is invariant under the action of $SO(n)$ on itself.
_The existence and uniqueness of the Haar measure is a theorem in compact lie group theory. For this research topic, we will not prove it._
### Random sampling on the $\mathbb{C}P^n$
Note that the space of pure state in bipartite system
## Statement
Choosing a random pure quantum state $\rho$ from the bi-partite pure state space $\mathcal{H}_A\otimes\mathcal{H}_B$ with the uniform distribution, the expected entropy of the reduced state $\rho_A$ is:
$$
\mathbb{E}[H(\rho_A)]\geq \ln d_A -\frac{1}{2\ln 2} \frac{d_A}{d_B}
$$
## Page's conjecture
A quantum system $AB$ with the Hilbert space dimension $mn$ in a pure state $\rho_{AB}$ has entropy $0$ but the entropy of the reduced state $\rho_A$, assume $m\leq n$, then entropy of the two subsystem $A$ and $B$ is greater than $0$.
unless $A$ and $B$ are separable.
In the original paper, the entropy of the average state taken under the unitary invariant Haar measure is:
$$
S_{m,n}=\sum_{k=n+1}^{mn}\frac{1}{k}-\frac{m-1}{2n}\simeq \ln m-\frac{m}{2n}
$$
## References
- [The random Matrix Theory of the Classical Compact groups](https://case.edu/artsci/math/esmeckes/Haar_book.pdf)
- [Page's conjecture](https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.71.1291)
- [Page's conjecture simple proof](https://journals.aps.org/pre/pdf/10.1103/PhysRevE.52.5653)
- [Geometry of quantum states an introduction to quantum entanglement second edition](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C)

View File

@@ -1,299 +0,0 @@
# Math 401 Paper 1, Side note 3: Levy's concentration theorem
Our goal is to prove the generalized version of Levy's concentration theorem used in Hayden's work for $\eta$-Lipschitz functions.
Let $f:S^{n-1}\to \mathbb{R}$ be a $\eta$-Lipschitz function. Let $M_f$ denote the median of $f$ and $\langle f\rangle$ denote the mean of $f$. (Note this can be generalized to many other manifolds.)
Select a random point $x\in S^{n-1}$ with $n>2$ according to the uniform measure (Haar measure). Then the probability of observing a value of $f$ much different from the reference value is exponentially small.
$$
\operatorname{Pr}[|f(x)-M_f|>\epsilon]\leq \exp(-\frac{n\epsilon^2}{2\eta^2})
$$
$$
\operatorname{Pr}[|f(x)-\langle f\rangle|>\epsilon]\leq 2\exp(-\frac{(n-1)\epsilon^2}{2\eta^2})
$$
> This version of Levy's concentration theorem can be found in [Geometry of Quantum states](https://www.cambridge.org/core/books/geometry-of-quantum-states/46B62FE3F9DA6E0B4EDDAE653F61ED8C) 15.84 and 15.85.
## Basic definitions
### Lipschitz function
#### $\eta$-Lipschitz function
Let $(X,\operatorname{dist}_X)$ and $(Y,\operatorname{dist}_Y)$ be two metric spaces. A function $f:X\to Y$ is said to be $\eta$-Lipschitz if there exists a constant $L\in \mathbb{R}$ such that
$$
\operatorname{dist}_Y(f(x),f(y))\leq L\operatorname{dist}_X(x,y)
$$
for all $x,y\in X$. And $\eta=\|f\|_{\operatorname{Lip}}=\inf_{L\in \mathbb{R}}L$.
That basically means that the function $f$ should not change the distance between any two pairs of points in $X$ by more than a factor of $L$.
## Levy's concentration theorem in _High-dimensional probability_ by Roman Vershynin
### Levy's concentration theorem (Vershynin's version)
> This theorem is exactly the 5.1.4 on the _High-dimensional probability_ by Roman Vershynin.
#### Isoperimetric inequality on $\mathbb{R}^n$
Among all subsets $A\subset \mathbb{R}^n$ with a given volume, the Euclidean ball has the minimal area.
That is, for any $\epsilon>0$, Euclidean balls minimize the volume of the $\epsilon$-neighborhood of $A$.
Where the volume of the $\epsilon$-neighborhood of $A$ is defined as
$$
A_\epsilon(A)\coloneqq \{x\in \mathbb{R}^n: \exists y\in A, \|x-y\|_2\leq \epsilon\}=A+\epsilon B_2^n
$$
Here the $\|\cdot\|_2$ is the Euclidean norm. (The theorem holds for both geodesic metric on sphere and Euclidean metric on $\mathbb{R}^n$.)
#### Isoperimetric inequality on the sphere
Let $\sigma_n(A)$ denotes the normalized area of $A$ on $n$ dimensional sphere $S^n$. That is $\sigma_n(A)\coloneqq\frac{\operatorname{Area}(A)}{\operatorname{Area}(S^n)}$.
Let $\epsilon>0$. Then for any subset $A\subset S^n$, given the area $\sigma_n(A)$, the spherical caps minimize the volume of the $\epsilon$-neighborhood of $A$.
> The above two inequalities is not proved in the Book _High-dimensional probability_. But you can find it in the Appendix C of Gromov's book _Metric Structures for Riemannian and Non-Riemannian Spaces_.
To continue prove the theorem, we use sub-Gaussian concentration *(Chapter 3 of _High-dimensional probability_ by Roman Vershynin)* of sphere $\sqrt{n}S^n$.
This will leads to some constant $C>0$ such that the following lemma holds:
#### The "Blow-up" lemma
Let $A$ be a subset of sphere $\sqrt{n}S^n$, and $\sigma$ denotes the normalized area of $A$. Then if $\sigma\geq \frac{1}{2}$, then for every $t\geq 0$,
$$
\sigma(A_t)\geq 1-2\exp(-ct^2)
$$
where $A_t=\{x\in S^n: \operatorname{dist}(x,A)\leq t\}$ and $c$ is some positive constant.
#### Proof of the Levy's concentration theorem
Proof:
Without loss of generality, we can assume that $\eta=1$. Let $M$ denotes the median of $f(X)$.
So $\operatorname{Pr}[|f(X)\leq M|]\geq \frac{1}{2}$, and $\operatorname{Pr}[|f(X)\geq M|]\geq \frac{1}{2}$.
Consider the sub-level set $A\coloneqq \{x\in \sqrt{n}S^n: |f(x)|\leq M\}$.
Since $\operatorname{Pr}[X\in A]\geq \frac{1}{2}$, by the blow-up lemma, we have
$$
\operatorname{Pr}[X\in A_t]\geq 1-2\exp(-ct^2)
$$
And since
$$
\operatorname{Pr}[X\in A_t]\leq \operatorname{Pr}[f(X)\leq M+t]
$$
Combining the above two inequalities, we have
$$
\operatorname{Pr}[f(X)\leq M+t]\geq 1-2\exp(-ct^2)
$$
## Levy's concentration theorem in _Metric Structures for Riemannian and Non-Riemannian Spaces_ by M. Gromov
### Levy's concentration theorem (Gromov's version)
> The Levy's lemma can also be found in _Metric Structures for Riemannian and Non-Riemannian Spaces_ by M. Gromov. $3\frac{1}{2}.19$ The Levy concentration theory.
#### Theorem $3\frac{1}{2}.19$ Levy concentration theorem:
An arbitrary 1-Lipschitz function $f:S^n\to \mathbb{R}$ concentrates near a single value $a_0\in \mathbb{R}$ as strongly as the distance function does.
That is
$$
\mu\{x\in S^n: |f(x)-a_0|\geq\epsilon\} < \kappa_n(\epsilon)\leq 2\exp(-\frac{(n-1)\epsilon^2}{2})
$$
where
$$
\kappa_n(\epsilon)=\frac{\int_\epsilon^{\frac{\pi}{2}}\cos^{n-1}(t)dt}{\int_0^{\frac{\pi}{2}}\cos^{n-1}(t)dt}
$$
$a_0$ is the **Levy mean** of function $f$, that is the level set of $f^{-1}:\mathbb{R}\to S^n$ divides the sphere into equal halves, characterized by the following equality:
$$
\mu(f^{-1}(-\infty,a_0])\geq \frac{1}{2} \text{ and } \mu(f^{-1}[a_0,\infty))\geq \frac{1}{2}
$$
Hardcore computing may generates the bound but M. Gromov did not make the detailed explanation here.
> Detailed proof by Takashi Shioya.
>
> The central idea is to draw the connection between the given three topological spaces, $S^{2n+1}$, $CP^n$ and $\mathbb{R}$.
First, we need to introduce the following distribution and lemmas/theorems:
**OBSERVATION**
consider the orthogonal projection from $\mathbb{R}^{n+1}$, the space where $S^n$ is embedded, to $\mathbb{R}^k$, we denote the restriction of the projection as $\pi_{n,k}:S^n(\sqrt{n})\to \mathbb{R}^k$. Note that $\pi_{n,k}$ is a 1-Lipschitz function (projection will never increase the distance between two points).
We denote the normalized Riemannian volume measure on $S^n(\sqrt{n})$ as $\sigma^n(\cdot)$, and $\sigma^n(S^n(\sqrt{n}))=1$.
#### Definition of Gaussian measure on $\mathbb{R}^k$
We denote the Gaussian measure on $\mathbb{R}^k$ as $\gamma^k$.
$$
d\gamma^k(x)\coloneqq\frac{1}{\sqrt{2\pi}^k}\exp(-\frac{1}{2}\|x\|^2)dx
$$
$x\in \mathbb{R}^k$, $\|x\|^2=\sum_{i=1}^k x_i^2$ is the Euclidean norm, and $dx$ is the Lebesgue measure on $\mathbb{R}^k$.
Basically, you can consider the Gaussian measure as the normalized Lebesgue measure on $\mathbb{R}^k$ with standard deviation $1$.
#### Maxwell-Boltzmann distribution law
> It is such a wonderful fact for me, that the projection of $n+1$ dimensional sphere with radius $\sqrt{n}$ to $\mathbb{R}^k$ is a Gaussian distribution as $n\to \infty$.
For any natural number $k$,
$$
\frac{d(\pi_{n,k})_*\sigma^n(x)}{dx}\to \frac{d\gamma^k(x)}{dx}
$$
where $(\pi_{n,k})_*\sigma^n$ is the push-forward measure of $\sigma^n$ by $\pi_{n,k}$.
In other words,
$$
(\pi_{n,k})_*\sigma^n\to \gamma^k\text{ weakly as }n\to \infty
$$
<details>
<summary>Proof</summary>
We denote the $n$ dimensional volume measure on $\mathbb{R}^k$ as $\operatorname{vol}_k$.
Observe that $\pi_{n,k}^{-1}(x),x\in \mathbb{R}^k$ is isometric to $S^{n-k}(\sqrt{n-\|x\|^2})$, that is, for any $x\in \mathbb{R}^k$, $\pi_{n,k}^{-1}(x)$ is a sphere with radius $\sqrt{n-\|x\|^2}$ (by the definition of $\pi_{n,k}$).
So,
$$
\begin{aligned}
\frac{d(\pi_{n,k})_*\sigma^n(x)}{dx}&=\frac{\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(x))}{\operatorname{vol}_k(S^n(\sqrt{n}))}\\
&=\frac{(n-\|x\|^2)^{\frac{n-k}{2}}}{\int_{\|x\|\leq \sqrt{n}}(n-\|x\|^2)^{\frac{n-k}{2}}dx}\\
\end{aligned}
$$
as $n\to \infty$.
note that $\lim_{n\to \infty}{(1-\frac{a}{n})^n}=e^{-a}$ for any $a>0$.
$(n-\|x\|^2)^{\frac{n-k}{2}}=\left(n(1-\frac{\|x\|^2}{n})\right)^{\frac{n-k}{2}}\to n^{\frac{n-k}{2}}\exp(-\frac{\|x\|^2}{2})$
So
$$
\begin{aligned}
\frac{(n-\|x\|^2)^{\frac{n-k}{2}}}{\int_{\|x\|\leq \sqrt{n}}(n-\|x\|^2)^{\frac{n-k}{2}}dx}&=\frac{e^{-\frac{\|x\|^2}{2}}}{\int_{x\in \mathbb{R}^k}e^{-\frac{\|x\|^2}{2}}dx}\\
&=\frac{1}{(2\pi)^{\frac{k}{2}}}e^{-\frac{\|x\|^2}{2}}\\
&=\frac{d\gamma^k(x)}{dx}
\end{aligned}
$$
QED
</details>
#### Proof of the Levy's concentration theorem via the Maxwell-Boltzmann distribution law
We use the Maxwell-Boltzmann distribution law and Levy's isoperimetric inequality to prove the Levy's concentration theorem.
The goal is the same as the Gromov's version, first we bound the probability of the sub-level set of $f$ by the $\kappa_n(\epsilon)$ function by Levy's isoperimetric inequality. Then we claim that the $\kappa_n(\epsilon)$ function is bounded by the Gaussian distribution.
Note, this section is not rigorous enough in sense of mathematics and the author should add sections about Levy family and observable diameter to make the proof more rigorous and understandable.
<details>
<summary>Proof</summary>
Let $f:S^n\to \mathbb{R}$ be a 1-Lipschitz function.
Consider the two sets of points on the sphere $S^n$ with radius $\sqrt{n}$:
$$
\Omega_+=\{x\in S^n: f(x)\leq a_0-\epsilon\}, \Omega_-=\{x\in S^n: f(x)\geq a_0+\epsilon\}
$$
Note that $\Omega_+\cup \Omega_-$ is the whole sphere $S^n(\sqrt{n})$.
By the Levy's isoperimetric inequality, we have
$$
\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\epsilon))\leq \operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\Omega_+))+\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\Omega_-))
$$
We define $\kappa_n(\epsilon)$ as the following:
$$
\kappa_n(\epsilon)=\frac{\operatorname{vol}_{n-k}(\pi_{n,k}^{-1}(\epsilon))}{\operatorname{vol}_k(S^n(\sqrt{n}))}=\frac{\int_\epsilon^{\frac{\pi}{2}}\cos^{n-1}(t)dt}{\int_0^{\frac{\pi}{2}}\cos^{n-1}(t)dt}
$$
By the Levy's isoperimetric inequality, and the Maxwell-Boltzmann distribution law, we have
$$
\mu\{x\in S^n: |f(x)-a_0|\geq\epsilon\} < \kappa_n(\epsilon)\leq 2\exp(-\frac{(n-1)\epsilon^2}{2})
$$
</details>
## Levy's Isoperimetric inequality
> This section is from the Appendix $C_+$ of Gromov's book _Metric Structures for Riemannian and Non-Riemannian Spaces_.
Not very edible for undergraduates.
## Crash course on Riemannian manifolds
> This part might be extended to a separate note, let's check how far we can go from this part.
>
> References:
>
> - [Riemannian Geometry by John M. Lee](https://www.amazon.com/Introduction-Riemannian-Manifolds-Graduate-Mathematics/dp/3319917544?dib=eyJ2IjoiMSJ9.88u0uIXulwPpi3IjFn9EdOviJvyuse9V5K5wZxQEd6Rto5sCIowzEJSstE0JtQDW.QeajvjQEbsDmnEMfPzaKrfVR9F5BtWE8wFscYjCAR24&dib_tag=se&keywords=riemannian+manifold+by+john+m+lee&qid=1753238983&sr=8-1)
### Riemannian manifolds
A Riemannian manifold is a smooth manifold equipped with a **Riemannian metric**, which is a smooth assignment of an inner product to each tangent space $T_pM$ of the manifold.
An example of Riemannian manifold is the sphere $\mathbb{C}P^n$.
### Riemannian metric
A Riemannian metric is a smooth assignment of an inner product to each tangent space $T_pM$ of the manifold.
An example of Riemannian metric is the Euclidean metric on $\mathbb{R}^n$.
### Notion of Connection
A connection is a way to define the directional derivative of a vector field along a curve on a Riemannian manifold.
For every $p\in M$, where $M$ denote the manifold, suppose $M=\mathbb{R}^n$, then let $X=(f_1,\cdots,f_n)$ be a vector field on $M$. The directional derivative of $X$ along the point $p$ is defined as
$$
D_VX=\lim_{h\to 0}\frac{X(p+h)-X(p)}{h}
$$
### Nabla notation and Levi-Civita connection
### Ricci curvature
## References
- [High-dimensional probability by Roman Vershynin](https://www.math.uci.edu/~rvershyn/papers/HDP-book/HDP-2.pdf)
- [Metric Structures for Riemannian and Non-Riemannian Spaces by M. Gromov](https://www.amazon.com/Structures-Riemannian-Non-Riemannian-Progress-Mathematics/dp/0817638989/ref=tmm_hrd_swatch_0?_encoding=UTF8&dib_tag=se&dib=eyJ2IjoiMSJ9.Tp8dXvGbTj_D53OXtGj_qOdqgCgbP8GKwz4XaA1xA5PGjHj071QN20LucGBJIEps.9xhBE0WNB0cpMfODY5Qbc3gzuqHnRmq6WZI_NnIJTvc&qid=1750973893&sr=8-1)
- [Metric Measure Geometry by Takashi Shioya](https://arxiv.org/pdf/1410.0428)

View File

@@ -1,276 +0,0 @@
# Math401 Topic 1: Probability under language of measure theory
## Section 1: Uniform Random Numbers
### Basic Definitions
#### Definition of Random Variables
A random variable is a function $f:[0,1]\to S$, where $[0,1]\subset \mathbb{R}$ and $S$ is a set of potential outcomes of a random phenomenon.
#### Definition of Uniform Distribution
The uniform distribution is defined by the length of function on subsets of $[0,1]$ as a measure of probability ([Lebesgue measure](https://notenextra.trance-0.com/Math4121/Math4121_L30#lebesgue-measure) by default).
Let $X$ be a random number taken from $[0,1]$ and having the uniform distribution. The probability that $X$ should be the probability of the event that $X$ lies in $A$.
$$
\operatorname{Prob}(X\in A) =\lambda(A)=\text{length of }A
$$
#### Definition of Expectation
Let $f:[0,1]\to \mathbb{R}$ be a random variable (with nice properties such that it is integrable). Then the expectation of $f$ is defined as
$$
\mathbb{E}[f]=\mathbb{E}[f(X)]=\int_0^1 f(x)dx
$$
#### Definition of Indicator Function
The indicator function of an event $A$ is defined as
$$
\mathbb{I}_A(x)=\begin{cases}
1 & \text{if } x\in A \\
0 & \text{if } x\notin A
\end{cases}
$$
#### Definition of Law of variable X
The law of a random variable $X$ is the probability distribution of $X$.
Let $Y$ be the outcome of $f(X)$. Then the law of $Y$ is the probability distribution of $Y$.
$$
\mu_Y(A)=\lambda(f^{-1}(A))=\lambda(\{x\in [0,1]: f(x)\in A\})
$$
### 1.1 Mathematical Coin Flip model
A coin flip if a random experiment with two possible outcomes: $S=\{0,1\}$. with probability $p$ for $0$ and $1-p$ for $1$, where $p\in (0,1)\subset \mathbb{R}$.
#### Definition of Independent Events
Two events $A$ and $B$ are independent if
$$
\lambda(A\cap B)=\lambda(A)\lambda(B)
$$
or equivalently,
$$
\operatorname{Prob}(X\in A\cap B)=\operatorname{Prob}(X\in A)\operatorname{Prob}(X\in B)
$$
Generalization to $n$ events:
$$
\lambda(A_1\cap A_2\cap \cdots \cap A_n)=\lambda(A_1)\lambda(A_2)\cdots \lambda(A_n)
$$
#### Definition of Outcome selecting function
Let the set of all possible outcomes represented by a Cartesian product $S=\{0,1\}^{\mathbb{N}}$. $(a_1,a_2,a_3,\cdots)\subset S$ is an infinite (or finite) sequence of coin flips.
$\pi_i:S\to \{0,1\}$ is the $i$-th projection function defined as $\pi_i((a_1,a_2,a_3,\cdots))=a_i$.
> Note, this representation is isomorphic to the dyadic rationals (i.e., numbers that can be written as a fraction whose denominator is a power of 2) in the interval $[0,1]$.
## Section 2: Formal definitions
> Recall, the $\sigma$-algebra (denoted as $\mathcal{A}$ in Math4121) is the collection of all subsets of a set $S$ satisfying the following properties:
>
> 1. $\emptyset\in \mathcal{A}$ (empty set is in the $\sigma$-algebra)
> 2. If $A\in \mathcal{A}$, then $A^c\in \mathcal{A}$ (if a set is in the $\sigma$-algebra, then its complement is in the $\sigma$-algebra)
> 3. If $A_1,A_2,A_3,\cdots\in \mathcal{A}$, then $\bigcup_{i=1}^{\infty}A_i\in \mathcal{A}$ (if a countable sequence of sets is in the $\sigma$-algebra, then their union is in the $\sigma$-algebra)
### Event, probability, and random variable
Let $\Omega$ be a non-empty set.
Let $\mathscr{F}$ be a $\sigma$-algebra on $\Omega$ (Note, $\mathscr{F}$ is a collection of subsets of $\Omega$ that satisfies the properties of a $\sigma$-algebra).
#### Definition of Event
An event is a element of $\mathscr{F}$.
#### Definition of Probability Measure
A probability measure $P$ is a function $P:\mathscr{F}\to [0,1]$ satisfying the following properties:
1. $P(\Omega)=1$
2. If $A_1,A_2,A_3,\cdots\in \mathscr{F}$ are pairwise disjoint ($\forall i\neq j, A_i\cap A_j=\emptyset$), then $P(\bigcup_{i=1}^{\infty}A_i)=\sum_{i=1}^{\infty}P(A_i)$
#### Definition of Probability Space
A probability space is a triple $(\Omega, \mathscr{F}, P)$ defined above.
An event $A$ is said to occur almost surely (a.s.) if $P(A)=1$.
#### Definition of Random Variable
A random variable is a function $X:\Omega\to \mathbb{R}$ that is measurable with respect to the $\sigma$-algebra $\mathscr{F}$.
That is, for any Borel set $B\subset \mathbb{R}$, the preimage $f^{-1}(B)\in \mathscr{F}$.
$$
f^{-1}(B)=\{x\in \Omega: f(x)\in B\}\in \mathscr{F}
$$
#### Definition of sigma-algebra generated by a random variable
Let $\{f_\alpha:\Omega\to \mathbb{R},\alpha\in I\}$ be a family of functions where $I$ is an index set which is not necessarily finite or countable. The $\sigma$-algebra generated by the family of functions $\{f_\alpha:\alpha\in I\}$, denoted as $\sigma\{f_\alpha:\alpha\in I\}$, is the smallest $\sigma$-algebra containing all the subsets of $\Omega$ of the form
$$
f_\alpha^{-1}(B)=\{\omega\in \Omega: f_\alpha(\omega)\in B\}\in \mathscr{F}
$$
for all $\alpha\in I$ and $B\in \mathscr{B}(\mathbb{R})$.
Equivalently,
$$
\sigma\{f_\alpha:\alpha\in I\}=\sigma\left(\bigcup_{\alpha\in I}f_\alpha^{-1}(B)\right)
$$
the sigma-algebra generated by a random variable $X$ is the intersection of all $\sigma$-algebras on $\Omega$ containing the sets $f_\alpha^{-1}(B)$ for all $\alpha\in I$ and $B\in \mathscr{B}(\mathbb{R})$.
#### Definition of distribution of random variable
Let $f:\Omega\to \mathbb{R}$ be a random variable. The distribution of $f$ is the probability measure $P_f$ on $\mathbb{R}$ defined by
$$
P_f(B)=P(f^{-1}(B))=P(\{x\in \Omega: f(x)\in B\})
$$
also noted as $f_*P$.
#### Definition of joint distribution of random variables
Let $f_1,f_2,\cdots,f_n:\Omega\to \mathbb{R}$ be random variables. The joint distribution of $f_1,f_2,\cdots,f_n$ is the probability measure $P_{f_1,f_2,\cdots,f_n}$ on $\mathbb{R}^n$ defined by
$$
P_{f_1,f_2,\cdots,f_n}(B)=P(f_1^{-1}(B_1)\cap f_2^{-1}(B_2)\cap \cdots \cap f_n^{-1}(B_n))=P(\omega\in \Omega: (f_1(\omega),f_2(\omega),\cdots,f_n(\omega))\in B)
$$
### Expectation of a random variable
Let $f:\Omega\to \mathbb{R}$ be a random variable. The expectation of $f$ is defined as
$$
\mathbb{E}[f]=\mathbb{E}[f(X)]=\int_\Omega f(x)dP
$$
Note, $P$ is the probability measure on $\Omega$.
#### Definition of variance
The variance of a random variable $f$ is defined as
$$
\operatorname{Var}(f)=\mathbb{E}[(f-\mathbb{E}[f])^2]=\mathbb{E}[f^2]-(\mathbb{E}[f])^2
$$
#### Definition of covariance
The covariance of two random variables $f,g:\Omega\to \mathbb{R}$ is defined as
$$
\operatorname{Cov}(f,g)=\mathbb{E}[(f-\mathbb{E}[f])(g-\mathbb{E}[g])]
$$
### Point measures
#### Definition of Dirac measure
The Dirac measure is a probability measure on $\Omega$ defined as
$$
\delta_\omega(A)=\begin{cases}
1 & \text{if } \omega\in A \\
0 & \text{if } \omega\notin A
\end{cases}
$$
Note that $\int_\Omega f(x)d\delta_\omega(x)=f(\omega)$.
### Infinite sequence of independent coin flips
> Side notes from basic topology:
>
> **Definition of product topology**:
>
> It is a set constructed by the Cartesian product of the sets. Suppose $X_i$ is a set for all $i\in I$. The element of the product set is a tuple $(x_i)_{i\in I}$ where $x_i\in X_i$ for all $i\in I$.
>
> For example, if $X_i=[0,1]$ for all $i\in \mathbb{N}$, then the product set is $[0,1]^{\mathbb{N}}$. An element of such product set is $(1,0.5,0.25,\cdots)$.
The set of outcomes of such infinite sequence of coin flips is the product set of the set of outcomes of each coin flip.
$$
S=\{0,1\}^{\mathbb{N}}
$$
### Conditional probability
#### Definition of conditional probability
The conditional probability of an event $A$ given an event $B$ is defined as
$$
P(A|B)=\frac{P(A\cap B)}{P(B)}
$$
The law of total probability:
$$
P(A)=\sum_{i=1}^{\infty}P(A|B_i)P(B_i)
$$
Bayes' theorem:
$$
P(B_i|A)=\frac{P(A|B_i)P(B_i)}{\sum_{j=1}^{\infty}P(A|B_j)P(B_j)}
$$
#### Definition of independence of random variables
Two random variables $f,g:\Omega\to \mathbb{R}$ are independent if for any Borel sets $A,B\subset \mathscr{B}(\mathbb{R})$ the events
$$
\{\omega\in \Omega: f(\omega)\in A\}\text{ and } \{\omega\in \Omega: g(\omega)\in B\}
$$
are independent.
In general, a finite or infinite family of random variables $f_1,f_2,\cdots,f_n:\Omega\to \mathbb{R}$ are independent if every finite collection of random variables from this family are independent.
#### Definition of independence of sigma-algebras
Let $\mathscr{G}$ and $\mathscr{H}$ be two $\sigma$-algebras on $\Omega$. They are independent if for any Borel sets $A\subset \mathscr{B}(\mathbb{R})$ and $B\subset \mathscr{B}(\mathbb{R})$, the finite collection of events are independent.
## Section 3: Further definitions in measure theory and integration
### $L^2$ space
#### Definition of $L^2$ space
Let $(\Omega, \mathscr{F}, P)$ be a measure space. The $L^2$ space is the space of all square integrable, complex-valued measurable functions on $\Omega$.
Denoted by $L^2(\Omega, \mathscr{F}, P)$.
The square integrable functions are the functions $f:\Omega\to \mathbb{C}$ such that
$$
\int_\Omega |f(\omega)|^2 dP(\omega)<\infty
$$
With inner product defined by
$$
\langle f,g\rangle=\int_\Omega \overline{f(\omega)}g(\omega)dP(\omega)
$$
The $L^2(\Omega, \mathscr{F}, P)$ space is a Hilbert space.

View File

@@ -1,812 +0,0 @@
# Math401 Topic 2: Finite-dimensional Hilbert spaces
Recall the complex number is a tuple of two real numbers, $z=(a,b)$ with addition and multiplication defined by
$$
(a,b)+(c,d)=(a+c,b+d)
$$
$$
(a,b)\cdot(c,d)=(ac-bd,ad+bc)
$$
or in polar form,
$$
z=re^{i\theta}=r(\cos\theta+i\sin\theta)
$$
where $r=\sqrt{a^2+b^2}=\sqrt{z\overline{z}}$ and $\theta=\tan^{-1}(b/a)$.
The complex conjugate of $z$ is $\overline{z}=(a,-b)$.
## Section 1: Finite-dimensional Complex Vector Spaces
Here, we use the field $\mathbb{C}$ of complex numbers. or the field $\mathbb{R}$ of real numbers as the field $\mathbb{F}$ we are going to encounter.
### Definition of vector space
A vector space $\mathscr{V}$ over a field $\mathbb{F}$ is a set equipped with an **addition** and a **scalar multiplication**, satisfying the following axioms:
1. Addition is associative and commutative. For all $u,v,w\in \mathscr{V}$,
Associativity:
$$
(u+v)+w=u+(v+w)
$$
Commutativity:
$$
u+v=v+u
$$
2. Additive identity: There exists an element $0\in \mathscr{V}$ such that $v+0=v$ for all $v\in \mathscr{V}$.
3. Additive inverse: For each $v\in \mathscr{V}$, there exists an element $-v\in \mathscr{V}$ such that $v+(-v)=0$.
4. Multiplicative identity: There exists an element $1\in \mathbb{F}$ such that $v\cdot 1=v$ for all $v\in \mathscr{V}$.
5. Multiplicative inverse: For each $v\in \mathscr{V}$ and $c\in \mathbb{F}$, there exists an element $c^{-1}\in \mathbb{F}$ such that $v\cdot c^{-1}=1$.
6. Distributivity: For all $u,v\in \mathscr{V}$ and $c,d\in \mathbb{F}$,
$$
c(u+v)=cu+cv
$$
A vector is an ordered pair of elements over the field $\mathbb{F}$.
If we consider $\mathbb{F}=\mathbb{C}^n$, $n\in \mathbb{N}$, then $u=(a_1,a_2,\cdots,a_n), v=(b_1,b_2,\cdots,b_n)\in \mathbb{C}^n$ are vectors.
The addition and scalar multiplication are defined by
$$
u+v=(a_1+b_1,a_2+b_2,\cdots,a_n+b_n)
$$
$$
cu=(ca_1,ca_2,\cdots,ca_n)
$$
$c\in \mathbb{C}$.
The matrix transpose is defined by
$$
u^T=(a_1,a_2,\cdots,a_n)^T=\begin{pmatrix}
a_1 \\
a_2 \\
\vdots \\
a_n
\end{pmatrix}
$$
The complex conjugate transpose is defined by
$$
u^*=(a_1,a_2,\cdots,a_n)^*=\begin{pmatrix}
\overline{a_1} \\
\overline{a_2} \\
\vdots \\
\overline{a_n}
\end{pmatrix}
$$
> In physics, the complex conjugate is sometimes denoted by $z^*$ instead of $\overline{z}$.
> The complex conjugate transpose is sometimes denoted by $u^\dagger$ instead of $u^*$.
### Hermitian inner product and norms
On $\mathbb{C}^n$, the Hermitian inner product is defined by
$$
\langle u,v\rangle=\sum_{i=1}^n \overline{u_i}v_i
$$
The norm is defined by
$$
\|u\|=\sqrt{\langle u,u\rangle}
$$
#### Definition of Inner product
Let $\mathscr{H}$ be a complex vector space. An inner product on $\mathscr{H}$ is a function $\langle \cdot, \cdot \rangle: \mathscr{H}\times \mathscr{H}\to \mathbb{C}$ satisfying the following axioms:
1. For each $u\in \mathscr{H}$, $v\mapsto \langle u,v\rangle$ is a linear map.
$$
\langle u,av+bw\rangle=a\langle u,v\rangle+b\langle u,w\rangle
$$
For all $u,v,w\in \mathscr{H}$ and $a,b\in \mathbb{C}$.
2. For all $u,v\in \mathscr{H}$, $\langle u,v\rangle=\overline{\langle v,u\rangle}$.
$u\mapsto \langle u,v\rangle$ is a conjugate linear map.
3. $\langle u,u\rangle\geq 0$ and $\langle u,u\rangle=0$ if and only if $u=0$.
#### Definition of norm
Let $\mathscr{H}$ be a complex vector space. A norm on $\mathscr{H}$ is a function $\|\cdot\|: \mathscr{H}\to \mathbb{R}$ satisfying the following axioms:
1. For all $u\in \mathscr{H}$, $\|u\|\geq 0$ and $\|u\|=0$ if and only if $u=0$.
2. For all $u\in \mathscr{H}$ and $c\in \mathbb{C}$, $\|cu\|=|c|\|u\|$.
3. Triangle inequality: For all $u,v\in \mathscr{H}$, $\|u+v\|\leq \|u\|+\|v\|$.
#### Definition of inner product space
A complex vector space $\mathscr{H}$ with an inner product is called a **Hilbert space**.
#### Cauchy-Schwarz inequality
For all $u,v\in \mathscr{H}$,
$$
|\langle u,v\rangle|\leq \|u\|\|v\|
$$
#### Parallelogram law
For all $u,v\in \mathscr{H}$,
$$
\|u+v\|^2+\|u-v\|^2=2(\|u\|^2+\|v\|^2)
$$
#### Polarization identity
For all $u,v\in \mathscr{H}$,
$$
\langle u,v\rangle=\frac{1}{4}(\|u+v\|^2-\|u-v\|^2+i\|u+iv\|^2-i\|u-iv\|^2)
$$
#### Additional definitions
Let $u,v\in \mathscr{H}$.
$\|v\|$ is the length of $v$.
$v$ is a unit vector if $\|v\|=1$.
$u,v$ are orthogonal if $\langle u,v\rangle=0$.
#### Definition of orthonormal basis
A set of vectors $\{e_1,e_2,\cdots,e_n\}$ in a Hilbert space $\mathscr{H}$ is called an orthonormal basis if
1. $\langle e_i,e_j\rangle=\delta_{ij}$ for all $i,j\in \{1,2,\cdots,n\}$.
$$
\delta_{ij}=\begin{cases}
1 & \text{if } i=j \\
0 & \text{if } i\neq j
\end{cases}
$$
2. $n=\dim \mathscr{H}$.
### Subspaces and orthonormal bases
#### Definition of subspace
A subset $\mathscr{W}$ of a vector space $\mathscr{V}$ is a subspace if it is closed under addition and scalar multiplication.
#### Definition of orthogonal complement
Let $E$ be a subset of a Hilbert space $\mathscr{H}$. The orthogonal complement of $E$ is the set of all vectors in $\mathscr{H}$ that are orthogonal to every vector in $E$.
$$
E^\perp=\{v\in \mathscr{H}: \langle v,w\rangle=0 \text{ for all } w\in E\}
$$
#### Definition of orthogonal projection
Let $E$ be a $m$-dimensional subspace of a Hilbert space $\mathscr{H}$. An orthogonal projection of $E$ is a linear map $P_E: \mathscr{H}\to E$
$$
P_E(v)=\sum_{i=1}^m \langle v,e_i\rangle e_i
$$
#### Definition of orthonormal direct sum
A inner product space $\mathscr{H}$ is the direct sum of $E_1,E_2,\cdots,E_n$ if
$$
\mathscr{H}=E_1\oplus E_2\oplus \cdots \oplus E_n
$$
and $E_i\cap E_j=\{0\}$ for all $i\neq j$.
That is, $\forall v\in \mathscr{H}$, there exists a unique $v_i\in E_i$ such that $v=v_1+v_2+\cdots+v_n$.
#### Definition of meet and join of subspaces
Let $E$ and $F$ be two subspaces of a Hilbert space $\mathscr{H}$. The meet of $E$ and $F$ is the subspace $\mathscr{H}$ such that
$$
E\land F=E\cap F
$$
The join of $E$ and $F$ is the subspace $\mathscr{H}$ such that
$$
E\lor F=\{u+v: u\in E, v\in F\}
$$
### Null space and range
#### Definition of null space
Let $A$ be a linear map from a vector space $\mathscr{V}$ to a vector space $\mathscr{W}$. The null space of $A$ is the set of all vectors in $\mathscr{V}$ that are mapped to the zero vector in $\mathscr{W}$.
$$
\text{Null}(A)=\{v\in \mathscr{V}: Av=0\}
$$
#### Definition of range
Let $A$ be a linear map from a vector space $\mathscr{V}$ to a vector space $\mathscr{W}$. The range of $A$ is the set of all vectors in $\mathscr{W}$ that are mapped from $\mathscr{V}$.
$$
\text{Range}(A)=\{w\in \mathscr{W}: \exists v\in \mathscr{V}, Av=w\}
$$
### Dual spaces and adjoints of linear maps
#### Definition of linear map
A linear map $T: \mathscr{V}\to \mathscr{W}$ is a function that satisfies the following axioms:
1. Additivity: For all $u,v\in \mathscr{V}$ and $a,b\in \mathbb{F}$,
$$
T(au+bv)=aT(u)+bT(v)
$$
2. Homogeneity: For all $u\in \mathscr{V}$ and $a\in \mathbb{F}$,
$$
T(au)=aT(u)
$$
#### Definition of linear functionals
A linear functional $f: \mathscr{V}\to \mathbb{F}$ is a linear map from $\mathscr{V}$ to $\mathbb{F}$.
Here, $\mathbb{F}$ is the field of complex numbers.
#### Definition of dual space
Let $\mathscr{V}$ be a vector space over a field $\mathbb{F}$. The dual space of $\mathscr{V}$ is the set of all linear functionals on $\mathscr{V}$.
$$
\mathscr{V}^*=\{f:\mathscr{V}\to \mathbb{F}: f\text{ is linear}\}
$$
If $\mathscr{H}$ is a finite-dimensional Hilbert space, then $\mathscr{H}^*$ is isomorphic to $\mathscr{H}$.
Note $v\in \mathscr{H}\mapsto \langle v,\cdot\rangle\in \mathscr{H}^*$ is a conjugate linear isomorphism.
#### Definition of adjoint of a linear map
Let $T: \mathscr{V}\to \mathscr{W}$ be a linear map. The adjoint of $T$ is the linear map $T^*: \mathscr{W}\to \mathscr{V}$ such that
$$
\langle Tv,w\rangle=\langle v,T^*w\rangle
$$
for all $v\in \mathscr{V}$ and $w\in \mathscr{W}$.
#### Definition of self-adjoint operators
A linear operator $T: \mathscr{V}\to \mathscr{V}$ is self-adjoint if $T^*=T$.
#### Definition of unitary operators
A linear map $T: \mathscr{V}\to \mathscr{V}$ is unitary if $T^*T=TT^*=I$.
### Dirac's bra-ket notation
#### Definition of bra and ket
Let $\mathscr{H}$ be a Hilbert space. The bra-ket notation is a notation for vectors in $\mathscr{H}$.
$$
\langle v|w\rangle
$$
is the inner product of $v$ and $w$. That is, $\langle v|w\rangle: \mathscr{H}\to \mathbb{C}$ is a linear functional satisfying the property of inner product.
$$
|v\rangle
$$
is the vector (or linear map) $v$.
$$
|u\rangle\langle v|
$$
is a linear map from $\mathscr{H}$ to $\mathscr{H}$.
### The spectral theorem for self-adjoint operators
### Spectral theorem for self-adjoint operators
#### Definition of spectral theorem
Let $\mathscr{H}$ be a Hilbert space. A self-adjoint operator $T: \mathscr{H}\to \mathscr{H}$ is a linear operator that is equal to its adjoint.
Then all the eigenvalues of $T$ are real and there exists an orthonormal basis of $\mathscr{H}$ consisting of eigenvectors of $T$.
#### Definition of spectrum
The spectrum of a linear operator on finite-dimensional Hilbert space $T: \mathscr{H}\to \mathscr{H}$ is the set of all distinct eigenvalues of $T$.
$$
\operatorname{sp}(T)=\{\lambda: \lambda\text{ is an eigenvalue of } T\}\subset \mathbb{C}
$$
#### Definition of Eigenspace
If $\lambda$ is an eigenvalue of $T$, the eigenspace of $T$ corresponding to $\lambda$ is the set of all eigenvectors of $T$ corresponding to $\lambda$.
$$
E_\lambda(T)=\{v\in \mathscr{H}: Tv=\lambda v\}
$$
We denote $P_\lambda(T):\mathscr{H}\to E_\lambda(T)$ the orthogonal projection onto $E_\lambda(T)$.
#### Definition of Operator norm
The operator norm of a linear operator $T: \mathscr{H}\to \mathscr{H}$ is the largest eigenvalue of $T$.
$$
\|T\|=\max_{\|v\|=1} \|Tv\|
$$
We say $T$ is **bounded** if $\|T\|<\infty$.
We denote $B(\mathscr{H})$ the set of all bounded linear operators on $\mathscr{H}$.
### Partial trace
#### Definition of trace
Let $T$ be a linear operator on $\mathscr{H}$, $(e_1,e_2,\cdots,e_n)$ be a basis of $\mathscr{H}$ and $(\epsilon_1,\epsilon_2,\cdots,\epsilon_n)$ be a basis of dual space $\mathscr{H}^*$. Then the trace of $T$ is defined by
$$
\operatorname{Tr}(T)=\sum_{i=1}^n \epsilon_i(T(e_i))=\sum_{i=1}^n \langle e_i,T(e_i)\rangle
$$
This is equivalent to the sum of the diagonal elements of $T$.
> Note, I changed the order of the definitions for the trace to pack similar concepts together. Check the rest of the section defining the partial trace by viewing the [tensor product section](https://notenextra.trance-0.com/Math401/Math401_T2#tensor-products-of-finite-dimensional-hilbert-spaces) first, and return to this section after reading the tensor product of linear operators.
#### Definition of partial trace
Let $T$ be a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
An operator $T$ on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$ can be written as (by the definition of [tensor product of linear operators](https://notenextra.trance-0.com/Math401/Math401_T2#tensor-products-of-linear-operators))
$$
T=\sum_{i=1}^n a_i A_i\otimes B_i
$$
where $A_i$ is a linear operator on $\mathscr{A}$ and $B_i$ is a linear operator on $\mathscr{B}$.
The $\mathscr{B}$-partial trace of $T$ ($\operatorname{Tr}_{\mathscr{B}}(T):\mathcal{L}(\mathscr{A}\otimes \mathscr{B})\to \mathcal{L}(\mathscr{A})$) is the linear operator on $\mathscr{A}$ defined by
$$
\operatorname{Tr}_{\mathscr{B}}(T)=\sum_{i=1}^n a_i \operatorname{Tr}(B_i) A_i
$$
Or we can define the map $L_v: \mathscr{A}\to \mathscr{A}\otimes \mathscr{B}$ by
$$
L_v(u)=u\otimes v
$$
Note that $\langle u,L_v^*(u')\otimes v'\rangle=\langle u,u'\rangle \langle v,v'\rangle=\langle u\otimes v,u'\otimes v'\rangle=\langle L_v(u),u'\otimes v'\rangle$.
Therefore, $L_v^*\sum_{j} u_j\otimes v_j=\sum_{j} \langle v,v_j\rangle u_j$.
Then the partial trace of $T$ can also be defined by
**Let $\{v_j\}$ be a set of orthonormal basis of $\mathscr{B}$.**
$$
\operatorname{Tr}_{\mathscr{B}}(T)=\sum_{j} L^*_{v_j}(T)L_{v_j}
$$
#### Definition of partial trace with respect to a state
Let $T$ be a linear operator on $\mathscr{H}=\mathscr{A}\otimes \mathscr{B}$, where $\mathscr{A}$ and $\mathscr{B}$ are finite-dimensional Hilbert spaces.
Let $\rho$ be a state on $\mathscr{B}$ consisting of orthonormal basis $\{v_j\}$ and eigenvalue $\{\lambda_j\}$.
The partial trace of $T$ with respect to $\rho$ is the linear operator on $\mathscr{A}$ defined by
$$
\operatorname{Tr}_{\mathscr{A}}(T)=\sum_{j} \lambda_j L^*_{v_j}(T)L_{v_j}
$$
### Space of Bounded Linear Operators
> Recall the trace of a matrix is the sum of its diagonal elements.
#### Hilbert-Schmidt inner product
Let $T,S\in B(\mathscr{H})$. The Hilbert-Schmidt inner product of $T$ and $S$ is defined by
$$
\langle T,S\rangle=\operatorname{Tr}(T^*S)
$$
> Note here, $T^*$ is the complex conjugate transpose of $T$.
If we introduce the basis $\{e_i\}$ in $\mathscr{H}$, then we can write the the space of bounded linear operators as $n\times n$ complex-valued matrices $M_n(\mathbb{C})$.
For $T=(a_{ij})$, $S=(b_{ij})$, we have
$$
\operatorname{Tr}(A^*B)=\sum_{i=1}^n \sum_{j=1}^n \overline{a_{ij}}b_{ij}
$$
The inner product is the standard Hermitian inner product in $\mathbb{C}^{n\times n}$.
#### Definition of Hilbert-Schmidt norm (also called Frobenius norm)
The Hilbert-Schmidt norm of a linear operator $T: \mathscr{H}\to \mathscr{H}$ is defined by
$$
\|T\|=\sqrt{\sum_{i=1}^n \sum_{j=1}^n |a_{ij}|^2}
$$
**[The trace of operator does not depend on the basis.](https://notenextra.trance-0.com/Math429/Math429_L38#theorem-850)**
### Tensor products of finite-dimensional Hilbert spaces
Let $X=X_1\times X_2\times \cdots \times X_n$ be a Cartesian product of $n$ sets.
Let $x=(x_1,x_2,\cdots,x_n)$ be a vector in $X$.
$x_j\in X_j$ for $j=1,2,\cdots,n$.
Let $a\in X_j$ for $j=1,2,\cdots,n$.
Let's denote the space of all functions from $X$ to $\mathbb{C}$ by $\mathscr{H}$ and the space of all functions from $X_j$ to $\mathbb{C}$ by $\mathscr{H}_j$.
$$
\epsilon_{a}^{(j)}(x_j)=\begin{cases}
1 & \text{if } x_j=a \\
0 & \text{if } x_j\neq a
\end{cases}
$$
Then we can define a basis of $\mathscr{H}_j$ by $\{\epsilon_{a}^{(j)}(x_j)\}_{a\in X_j}$.
_Any function $f:X_j\to \mathbb{C}$ can be written as a linear combination of the basis vectors._
$$
f(x_j)=\sum_{a\in X_j} f(a)\epsilon_{a}^{(j)}(x_j)
$$
<details>
<summary>Proof</summary>
Note that a function is a map for all elements in the domain.
For each $a\in X_j$, $\epsilon_{a}^{(j)}(x_j)=1$ if $x_j=a$ and $0$ otherwise. So
$$
f(x_j)=\sum_{a\in X_j} f(a)\epsilon_{a}^{(j)}(x_j)=f(x_j)
$$
QED.
</details>
Now, let $a=(a_1,a_2,\cdots,a_n)$ be a vector in $X$, and $x=(x_1,x_2,\cdots,x_n)$ be a vector in $X$. Note that $a_j,x_j\in X_j$ for $j=1,2,\cdots,n$.
Define
$$
\epsilon_a(x)=\prod_{j=1}^n \epsilon_{a_j}^{(j)}(x_j)=\begin{cases}
1 & \text{if } a_j=x_j \text{ for all } j=1,2,\cdots,n \\
0 & \text{otherwise}
\end{cases}
$$
Then we can define a basis of $\mathscr{H}$ by $\{\epsilon_a\}_{a\in X}$.
_Any function $f:X\to \mathbb{C}$ can be written as a linear combination of the basis vectors._
$$
f(x)=\sum_{a\in X} f(a)\epsilon_a(x)
$$
<details>
<summary>Proof</summary>
This basically follows the same rascal as the previous proof. This time, the epsilon function only returns $1$ when $x_j=a_j$ for all $j=1,2,\cdots,n$.
$$
f(x)=\sum_{a\in X} f(a)\epsilon_a(x)=f(x)
$$
QED.
</details>
#### Definition of tensor product of basis elements
**The tensor product of basis elements** is defined by
$$
\epsilon_a\coloneqq\epsilon_{a_1}^{(1)}\otimes \epsilon_{a_2}^{(2)}\otimes \cdots \otimes \epsilon_{a_n}^{(n)}
$$
This is a basis of $\mathscr{H}$, here $\mathscr{H}$ is the set of all functions from $X=X_1\times X_2\times \cdots \times X_n$ to $\mathbb{C}$.
#### Definition of tensor product of two finite-dimensional Hilbert spaces
**The tensor product of two finite-dimensional Hilbert spaces** (in $\mathscr{H}$) is defined by
Let $\mathscr{H}_1$ and $\mathscr{H}_2$ be two finite dimensional Hilbert spaces. Let $u_1\in \mathscr{H}_1$ and $v_1\in \mathscr{H}_2$.
$$
u_1\otimes v_1
$$
is a bi-anti-linear map from $\mathscr{H}_1\times \mathscr{H}_2$ (the Cartesian product of $\mathscr{H}_1$ and $\mathscr{H}_2$, a tuple of two elements where first element is in $\mathscr{H}_1$ and second element is in $\mathscr{H}_2$) to $\mathbb{F}$ (in this case, $\mathbb{C}$). And $\forall u\in \mathscr{H}_1, v\in \mathscr{H}_2$,
$$
(u_1\otimes v_1)(u, v)=\langle u,u_1\rangle \langle v,v_1\rangle
$$
We call such forms **decomposable**. The tensor product of two finite-dimensional Hilbert spaces, denoted by $\mathscr{H}_1\otimes \mathscr{H}_2$, is the set of all linear combinations of decomposable forms. Represented by the following:
$$
\left(\sum_{i=1}^n a_i u_i\otimes v_i\right)(u, v) \coloneqq \sum_{i=1}^n a_j(u_j\otimes v_j)(u,v)=\sum_{i=1}^n a_i \langle v,u_i\rangle \langle v_i,u\rangle
$$
Note that $a_i\in \mathbb{C}$ for complex-vector spaces.
This is a linear space of dimension $\dim \mathscr{H}_1\times \dim \mathscr{H}_2$.
We define the inner product of two elements of $\mathscr{H}_1\otimes \mathscr{H}_2$ ($u_1\otimes v_1:(\mathscr{H}_1\otimes \mathscr{H}_2)\to \mathbb{C}$, $u_2\otimes v_2:(\mathscr{H}_1\otimes \mathscr{H}_2)\to \mathbb{C}$ $\in \mathscr{H}_1\otimes \mathscr{H}_2$) by
$$
\langle u_1\otimes v_1, u_2\otimes v_2\rangle\coloneqq\langle u_1,u_2\rangle \langle v_1,v_2\rangle=(u_1\otimes v_1)(u_2,v_2)
$$
### Tensor products of linear operators
Let $T_1$ be a linear operator on $\mathscr{H}_1$ and $T_2$ be a linear operator on $\mathscr{H}_2$, where $\mathscr{H}_1$ and $\mathscr{H}_2$ are finite-dimensional Hilbert spaces. The tensor product of $T_1$ and $T_2$ (denoted by $T_1\otimes T_2$) on $\mathscr{H}_1\otimes \mathscr{H}_2$, such that **on decomposable elements** is defined by
$$
(T_1\otimes T_2)(v_1\otimes v_2)=T_1(v_1)\otimes T_2(v_2)=\langle v_1,T_1(v_1)\rangle \langle v_2,T_2(v_2)\rangle
$$
for all $v_1\in \mathscr{H}_1$ and $v_2\in \mathscr{H}_2$.
The tensor product of two linear operators $T_1$ and $T_2$ is a linear combination in the form as follows:
$$
\sum_{i=1}^n a_i T_1(u_i)\otimes T_2(v_i)
$$
for all $u_i\in \mathscr{H}_1$ and $v_i\in \mathscr{H}_2$.
Such tensor product of linear operators is well defined.
<details>
<summary>Proof</summary>
If $\sum_{i=1}^n a_i u_i\otimes v_i=\sum_{j=1}^m b_j u_j\otimes v_j$, then $a_i=b_j$ for all $i=1,2,\cdots,n$ and $j=1,2,\cdots,m$.
Then $\sum_{i=1}^n a_i T_1(u_i)\otimes T_2(v_i)=\sum_{j=1}^m b_j T_1(u_j)\otimes T_2(v_j)$.
QED
</details>
An example of
#### Tensor product of linear operators on Hilbert spaces
Let $T_1$ be a linear operator on $\mathscr{H}_1$ and $T_2$ be a linear operator on $\mathscr{H}_2$, where $\mathscr{H}_1$ and $\mathscr{H}_2$ are finite-dimensional Hilbert spaces. The tensor product of $T_1$ and $T_2$ (denoted by $T_1\otimes T_2$) on $\mathscr{H}_1\otimes \mathscr{H}_2$, such that **on decomposable elements** is defined by
$$
(T_1\otimes T_2)(v_1\otimes v_2)=T_1(v_1)\otimes T_2(v_2)=\langle v_1,T_1(v_1)\rangle \langle v_2,T_2(v_2)\rangle
$$
#### Extended Dirac notation
Suppose $\mathscr{H}=\mathbb{C}^n$ with the standard basis $\{e_i\}$.
$e_j=|j\rangle$ and
$$
|j_1\dots j_n\rangle=e_{j_1}\otimes e_{j_2}\otimes \cdots \otimes e_{j_n}=
$$
#### The Hadamard Transform
Let $\mathscr{H}=\mathbb{C}^2$ with the standard basis $\{e_1,e_2\}=\{|0\rangle,|1\rangle\}$.
The linear operator $H_2$ is defined by
$$
H_2=\frac{1}{\sqrt{2}}\begin{pmatrix}
1 & 1 \\
1 & -1
\end{pmatrix}=\frac{1}{\sqrt{2}}(|0\rangle\langle 0|+|1\rangle\langle 0|+|0\rangle\langle 1|-|1\rangle\langle 1|)
$$
The Hadamard transform is the linear operator $H_2$ on $\mathbb{C}^2$.
### Singular value and Schmidt decomposition
#### Definition of SVD (Singular Value Decomposition)
Let $T:\mathscr{U}\to \mathscr{V}$ be a linear operator between two finite-dimensional Hilbert spaces $\mathscr{U}$ and $\mathscr{V}$.
We denote the inner product of $\mathscr{U}$ and $\mathscr{V}$ by $\langle \cdot, \cdot \rangle$.
Then there exists a decomposition of $T$
$$
T=d_1 T_1+d_2 T_2+\cdots +d_n T_n
$$
with $d_1>d_2>\cdots >d_n>0$ and $T_i:\mathscr{U}\to \mathscr{V}$ such that:
1. $T_iT_j^*=0$, $T_i^*T_j=0$ for $i\neq j$(
2. $T_i|_{\mathscr{R}(T_i^*)}:\mathscr{R}(T_i^*)\to \mathscr{R}(T_i)$ is an isomorphism with inverse $T_i^*$ where $\mathscr{R}(\cdot)$ is the range of the operator.
The $d_i$ are called the singular values of $T$.
[Gram-Schmidt Decomposition](https://notenextra.trance-0.com/Math429/Math429_L27#theorem-632-gram-schmidt)
## Basic Group Theory
### Finite groups
#### Definition of group
A group is a set $G$ with a binary operation $\cdot$ that satisfies the following axioms:
1. **Closure**: For all $a,b\in G$, $a\cdot b\in G$.
2. **Associativity**: For all $a,b,c\in G$, $(a\cdot b)\cdot c=a\cdot (b\cdot c)$.
3. **Identity**: There exists an element $e\in G$ such that for all $a\in G$, $a\cdot e=e\cdot a=a$.
4. **Inverses**: For all $a\in G$, there exists an element $b\in G$ such that $a\cdot b=b\cdot a=e$.
#### Symmetric group $S_n$
The symmetric group $S_n$ is the group of all permutations of $n$ elements.
$$
S_n=\{f: \{1,2,\cdots,n\}\to \{1,2,\cdots,n\} \text{ is a bijection}\}
$$
#### Unitary group $U(n)$
The unitary group $U(n)$ is the group of all $n\times n$ unitary matrices.
Such that $A^*=A$, where $A^*$ is the complex conjugate transpose of $A$. $A^*=(\overline{A})^T$.
#### Cyclic group $\mathbb{Z}_n$
The cyclic group $\mathbb{Z}_n$ is the group of all integers modulo $n$.
$$
\mathbb{Z}_n=\{0,1,2,\cdots,n-1\}
$$
#### Definition of group homomorphism
A group homomorphism is a function $\varPhi:G\to H$ between two groups $G$ and $H$ that satisfies the following axiom:
$$
\varPhi(a\cdot b)=\varPhi(a)\cdot \varPhi(b)
$$
A bijective group homomorphism is called group isomorphism.
#### Homomorphism sends identity to identity, inverses to inverses
Let $\varPhi:G\to H$ be a group homomorphism. $e_G$ and $e_H$ are the identity elements of $G$ and $H$ respectively. Then
1. $\varPhi(e_G)=e_H$
2. $\varPhi(a^{-1})=\varPhi(a)^{-1}$. $\forall a\in G$
### More on the symmetric group
#### General linear group over $\mathbb{C}$
The general linear group over $\mathbb{C}$ is the group of all $n\times n$ invertible complex matrices.
$$
GL(n,\mathbb{C})=\{A\in M_n(\mathbb{C}) \text{ is invertible}\}
$$
The map $T: S_n\to GL(n,\mathbb{C})$ is a group homomorphism.
#### Definition of sign of a permutation
Let $T:S_n\to GL(n,\mathbb{C})$ be the group homomorphism. The sign of a permutation $\sigma\in S_n$ is defined by
$$
\operatorname{sgn}(\sigma)=\det(T(\sigma))
$$
We say $\sigma$ is even if $\operatorname{sgn}(\sigma)=1$ and odd if $\operatorname{sgn}(\sigma)=-1$.
### Fourier Transform in $\mathbb{Z}_N$.
The vector space $L^2(\mathbb{Z}_N)$ is the set of all complex-valued functions on $\mathbb{Z}_N$ with the inner product
$$
\langle f,g\rangle=\sum_{k=0}^{N-1} \overline{f(k)}g(k)
$$
An orthonormal basis of $L^2(\mathbb{Z}_N)$ is given by $\delta_y,y\in \mathbb{Z}_N$.
$$
\delta_y(k)=\begin{cases}
1 & \text{if } k=y \\
0 & \text{otherwise}
\end{cases}
$$
in Dirac notation, we have
$$
\delta_y=|y\rangle=|y+N\rangle
$$
#### Definition of Fourier transform
Define $\varphi_k(x)=\frac{1}{\sqrt{N}}e^{2\pi i kx/N}$ for $k\in \mathbb{Z}_N$. $\varphi_k:\mathbb{Z}\to \mathbb{C}$ is a function.
The Fourier transform of a function $F\in L^2(\mathbb{Z}_N)$ such that $(Ff)(k)=\langle \varphi_k,f\rangle$ is defined by
$$
F=\frac{1}{\sqrt{N}}\sum_{j=0}^{N-1} \sum_{k=0}^{N-1} e^{2\pi i kj/N}|k\rangle\langle j|
$$
### Symmetric and anti-symmetric tensors
Let $\mathscr{H}^{\otimes n}$ be the $n$-fold tensor product of a Hilbert space $\mathscr{H}$.
We define the $S_n$ on $\mathscr{H}^{\otimes n}$ by
Let $\eta\in S_n$ be a permutation.
$$
\prod(\eta)v_1\otimes v_2\otimes \cdots \otimes v_n=v_{\eta^{-1}(1)}\otimes v_{\eta^{-1}(2)}\otimes \cdots \otimes v_{\eta^{-1}(n)}
$$
And extend to $\mathscr{H}^{\otimes n}$ by linearity.
This gives the property that $\zeta,\eta\in S_n$, $\prod(\zeta\eta)=\prod(\zeta)\prod(\eta)$.
#### Definition of symmetric and anti-symmetric tensors
Let $\mathscr{H}$ be a finite-dimensional Hilbert space.
An element in $\mathscr{H}^{\otimes n}$ is called symmetric if it is invariant under the action of $S_n$. Let $\alpha\in \mathscr{H}^{\otimes n}$
$$\prod(\eta)\alpha=\alpha \text{ for all } \eta\in S_n.$$
It is called anti-symmetric if
$$
\prod(\eta)\alpha=\operatorname{sgn}(\eta)\alpha \text{ for all } \eta\in S_n.
$$

View File

@@ -1,351 +0,0 @@
# Math401 Topic 3: Separable Hilbert spaces
## Infinite-dimensional Hilbert spaces
Recall from Topic 1.
[$L^2$ space](https://notenextra.trance-0.com/Math401/Math401_T1#section-3-further-definitions-in-measure-theory-and-integration)
Let $\lambda$ be a measure on $\mathbb{R}$, or any other field you are interested in.
A function is square integrable if
$$
\int_\mathbb{R} |f(x)|^2 d\lambda(x)<\infty
$$
### $L^2$ space and general Hilbert spaces
#### Definition of $L^2(\mathbb{R},\lambda)$
The space $L^2(\mathbb{R},\lambda)$ is the space of all square integrable, measurable functions on $\mathbb{R}$ with respect to the measure $\lambda$ (The Lebesgue measure).
The Hermitian inner product is defined by
$$
\langle f,g\rangle=\int_\mathbb{R} \overline{f(x)}g(x) d\lambda(x)
$$
The norm is defined by
$$
\|f\|=\sqrt{\int_\mathbb{R} |f(x)|^2 d\lambda(x)}
$$
The space $L^2(\mathbb{R},\lambda)$ is complete.
[Proof ignored here]
> Recall the definition of [complete metric space](https://notenextra.trance-0.com/Math4111/Math4111_L17#definition-312).
The inner product space $L^2(\mathbb{R},\lambda)$ is complete.
> Note that **by some general result in point-set topology**, a normed vector space can always be enlarged so as to become complete. This process is called completion of the normed space.
>
> Some exercise is showing some hints for this result:
>
> Show that the subspace of $L^2(\mathbb{R},\lambda)$ consisting of square integrable continuous functions is not closed.
>
> Suggestion: consider the sequence of continuous functions $f_1(x), f_2(x),\cdots$, where $f_n(x)$ is defined by the following graph:
>
> ![function.png](https://notenextra.trance-0.com/Math401/L2_square_integrable_problem.png)
>
> Show that $f_n$ converges in the $L^2$ norm to a function $f\in L^2(\mathbb{R},\lambda)$ but the limit function $f$ is not continuous. Draw the graph of $f_n$ to make this clear.
#### Definition of general Hilbert space
A Hilbert space is a complete inner product vector space.
#### General Pythagorean theorem
Let $u_1,u_2,\cdots,u_N$ be an orthonormal set in an inner product space $\mathscr{V}$ (may not be complete). Then for all $v\in \mathscr{V}$,
$$
\|v\|^2=\sum_{i=1}^N |\langle v,u_i\rangle|^2+\left\|v-\sum_{i=1}^N \langle v,u_i\rangle u_i\right\|^2
$$
[Proof ignored here]
#### Bessel's inequality
Let $u_1,u_2,\cdots,u_N$ be an orthonormal set in an inner product space $\mathscr{V}$ (may not be complete). Then for all $v\in \mathscr{V}$,
$$
\sum_{i=1}^N |\langle v,u_i\rangle|^2\leq \|v\|^2
$$
Immediate from the general Pythagorean theorem.
### Orthonormal bases
An orthonormal subset $S$ of a Hilbert space $\mathscr{H}$ is a set all of whose elements have norm 1 and are mutually orthogonal. ($\forall u,v\in S, \langle u,v\rangle=0$)
#### Definition of orthonormal basis
An orthonormal subset of $S$ of a Hilbert space $\mathscr{H}$ is an orthonormal basis of $\mathscr{H}$ if there are no other orthonormal subsets of $\mathscr{H}$ that contain $S$ as a proper subset.
#### Theorem of existence of orthonormal basis
Every separable Hilbert space has an orthonormal basis.
[Proof ignored here]
#### Theorem of Fourier series
Let $\mathscr{H}$ be a separable Hilbert space with an orthonormal basis $\{e_n\}$. Then for any $f\in \mathscr{H}$,
$$
f=\sum_{n=1}^\infty \langle f,e_n\rangle e_n
$$
The series converges to some $g\in \mathscr{H}$.
[Proof ignored here]
#### Fourier series in $L^2([0,2\pi],\lambda)$
Let $f\in L^2([0,2\pi],\lambda)$.
$$
f_N(x)=\sum_{n:|n|\leq N} c_n\frac{e^{inx}}{\sqrt{2\pi}}
$$
where $c_n=\frac{1}{2\pi}\int_0^{2\pi} f(x)e^{-inx} dx$.
The series converges to some $f\in L^2([0,2\pi],\lambda)$ as $N\to \infty$.
This is the Fourier series of $f$.
#### Hermite polynomials
The subspace spanned by polynomials is dense in $L^2(\mathbb{R},\lambda)$.
An orthonormal basis of $L^2(\mathbb{R},\lambda)$ can be obtained by the Gram-Schmidt process on $\{1,x,x^2,\cdots\}$.
The polynomials are called the Hermite polynomials.
### Isomorphism and $\ell_2$ space
#### Definition of isomorphic Hilbert spaces
Let $\mathscr{H}_1$ and $\mathscr{H}_2$ be two Hilbert spaces.
$\mathscr{H}_1$ and $\mathscr{H}_2$ are isomorphic if there exists a surjective linear map $U:\mathscr{H}_1\to \mathscr{H}_2$ that is bijective and preserves the inner product.
$$
\langle Uf,Ug\rangle=\langle f,g\rangle
$$
for all $f,g\in \mathscr{H}_1$.
When $\mathscr{H}_1=\mathscr{H}_2$, the map $U$ is called unitary.
#### $\ell_2$ space
The space $\ell_2$ is the space of all square summable sequences.
$$
\ell_2=\left\{(a_n)_{n=1}^\infty: \sum_{n=1}^\infty |a_n|^2<\infty\right\}
$$
An example of element in $\ell_2$ is $(1,0,0,\cdots)$.
With inner product
$$
\langle (a_n)_{n=1}^\infty, (b_n)_{n=1}^\infty\rangle=\sum_{n=1}^\infty \overline{a_n}b_n
$$
It is a Hilbert space (every Cauchy sequence in $\ell_2$ converges to some element in $\ell_2$).
### Bounded operators and continuity
Let $T:\mathscr{V}\to \mathscr{W}$ be a linear map between two vector spaces $\mathscr{V}$ and $\mathscr{W}$.
We define the norm of $\|\cdot\|$ on $\mathscr{V}$ and $\mathscr{W}$.
Then $T$ is continuous if for all $u\in \mathscr{V}$, if $u_n\to u$ in $\mathscr{V}$, then $T(u_n)\to T(u)$ in $\mathscr{W}$.
Using the delta-epsilon language, we can say that $T$ is continuous if for all $\epsilon>0$, there exists a $\delta>0$ such that if $\|u-v\|<\delta$, then $\|T(u)-T(v)\|<\epsilon$.
#### Definition of bounded operator
A linear map $T:\mathscr{V}\to \mathscr{W}$ is bounded if
$$
\|T\|=\sup_{\|u\|=1}\|T(u)\|< \infty
$$
#### Theorem of continuity and boundedness
A linear map $T:\mathscr{V}\to \mathscr{W}$ is continuous if and only if it is bounded.
[Proof ignored here]
#### Definition of bounded Hilbert space
The set of all bounded linear operators in $\mathscr{V}$ is denoted by $\mathscr{B}(\mathscr{V})$.
### Direct sum of Hilbert spaces
Suppose $\mathscr{H}_1$ and $\mathscr{H}_2$ are two Hilbert spaces.
The direct sum of $\mathscr{H}_1$ and $\mathscr{H}_2$ is the Hilbert space $\mathscr{H}_1\oplus \mathscr{H}_2$ with the inner product
$$
\langle (u_1,u_2),(v_1,v_2)\rangle=\langle u_1,v_1\rangle_{\mathscr{H}_1}+\langle u_2,v_2\rangle_{\mathscr{H}_2}
$$
Such space is denoted by $\mathscr{H}_1\oplus \mathscr{H}_2$.
A countable direct sum of Hilbert spaces can be defined similarly, as long as it is bounded.
That is, $\{u_n:n=1,2,\cdots\}$ is a sequence of elements in $\mathscr{H}_n$, and $\sum_{n=1}^\infty \|u_n\|^2<\infty$.
The inner product in such countable direct sum is defined by
$$
\langle (u_n)_{n=1}^\infty, (v_n)_{n=1}^\infty\rangle=\sum_{n=1}^\infty \langle u_n,v_n\rangle_{\mathscr{H}_n}
$$
Such space is denoted by $\mathscr{H}=\bigoplus_{n=1}^\infty \mathscr{H}_n$.
### Closed subspaces of Hilbert spaces
#### Definition of closed subspace
A subspace $\mathscr{M}$ of a Hilbert space $\mathscr{H}$ is closed if every convergent sequence in $\mathscr{M}$ converges to some element in $\mathscr{M}$.
#### Definition of pairwise orthogonal subspaces
Two subspaces $\mathscr{M}_1$ and $\mathscr{M}_2$ of a Hilbert space $\mathscr{H}$ are pairwise orthogonal if $\langle u,v\rangle=0$ for all $u\in \mathscr{M}_1$ and $v\in \mathscr{M}_2$.
### Orthogonal projections
#### Definition of orthogonal complement
The orthogonal complement of a subspace $\mathscr{M}$ of a Hilbert space $\mathscr{H}$ is the set of all elements in $\mathscr{H}$ that are orthogonal to every element in $\mathscr{M}$.
It is denoted by $\mathscr{M}^\perp=\{u\in \mathscr{H}: \langle u,v\rangle=0,\forall v\in \mathscr{M}\}$.
#### Projection theorem
Let $\mathscr{H}$ be a Hilbert space and $\mathscr{M}$ be a closed subspace of $\mathscr{H}$. Then for any $v\in \mathscr{H}$ can be uniquely decomposed as $v=u+w$ where $u\in \mathscr{M}$ and $w\in \mathscr{M}^\perp$.
[Proof ignored here]
### Dual Hilbert spaces
#### Norm of linear functionals
Let $\mathscr{H}$ be a Hilbert space.
The norm of a linear functional $f\in \mathscr{H}^*$ is defined by
$$
\|f\|=\sup_{\|u\|=1}|f(u)|
$$
#### Definition of dual Hilbert space
The dual Hilbert space of $\mathscr{H}$ is the space of all bounded linear functionals on $\mathscr{H}$.
It is denoted by $\mathscr{H}^*$.
$$
\mathscr{H}^*=\mathscr{B}(\mathscr{H},\mathbb{C})=\{f: \mathscr{H}\to \mathbb{C}: f\text{ is linear and }\|f\|<\infty\}
$$
You can exchange the $\mathbb{C}$ with any other field you are interested in.
#### The Riesz lemma
For each $f\in \mathscr{H}^*$, there exists a unique $v_f\in \mathscr{H}$ such that $f(u)=\langle u,v_f\rangle$ for all $u\in \mathscr{H}$. And $\|f\|=\|v_f\|$.
[Proof ignored here]
#### Definition of bounded sesqilinear form
A bounded sesqilinear form on $\mathscr{H}$ is a function $B: \mathscr{H}\times \mathscr{H}\to \mathbb{C}$ satisfying
1. $B(u,av+bw)=aB(u,v)+bB(u,w)$ for all $u,v,w\in \mathscr{H}$ and $a,b\in \mathbb{C}$.
2. $B(av+bw,u)=\overline{a}B(v,u)+\overline{b}B(w,u)$ for all $u,v,w\in \mathscr{H}$ and $a,b\in \mathbb{C}$.
3. $|B(u,v)|\leq C\|u\|\|v\|$ for all $u,v\in \mathscr{H}$ and some constant $C>0$.
There exists a unique bounded linear operator $A\in \mathscr{B}(\mathscr{H})$ such that $B(u,v)=\langle Au,v\rangle$ for all $u,v\in \mathscr{H}$. The norm of $A$ is the smallest constant $C$ such that $|B(u,v)|\leq C\|u\|\|v\|$ for all $u,v\in \mathscr{H}$.
[Proof ignored here]
### The adjoint of a bounded operator
Let $A\in \mathscr{B}(\mathscr{H})$. And bounded sesqilinear form $B: \mathscr{H}\times \mathscr{H}\to \mathbb{C}$ such that $B(u,v)=\langle u,Av\rangle$ for all $u,v\in \mathscr{H}$. Then there exists a unique bounded linear operator $A^*\in \mathscr{B}(\mathscr{H})$ such that $B(u,v)=\langle A^*u,v\rangle$ for all $u,v\in \mathscr{H}$.
[Proof ignored here]
And $\|A^*\|=\|A\|$.
Additional properties of bounded operators:
Let $A,B\in \mathscr{B}(\mathscr{H})$ and $a,b\in \mathbb{C}$. Then
1. $(aA+bB)^*=\overline{a}A^*+\overline{b}B^*$.
2. $(AB)^*=B^*A^*$.
3. $(A^*)^*=A$.
4. $\|A^*\|=\|A\|$.
5. $\|A^*A\|=\|A\|^2$.
#### Definition of self-adjoint operator
An operator $A\in \mathscr{B}(\mathscr{H})$ is self-adjoint if $A^*=A$.
#### Definition of normal operator
An operator $N\in \mathscr{B}(\mathscr{H})$ is normal if $NN^*=N^*N$.
#### Definition of unitary operator
An operator $U\in \mathscr{B}(\mathscr{H})$ is unitary if $U^*U=UU^*=I$.
where $I$ is the identity operator on $\mathscr{H}$.
#### Definition of orthogonal projection
An operator $P\in \mathscr{B}(\mathscr{H})$ is an orthogonal projection if $P^*=P$ and $P^2=P$.
### Tensor product of (infinite-dimensional) Hilbert spaces
#### Definition of tensor product
Let $\mathscr{H}_1$ and $\mathscr{H}_2$ be two Hilbert spaces. $u_1\in \mathscr{H}_1$ and $u_2\in \mathscr{H}_2$. Then $u_1\otimes u_2$ is an conjugate bilinear functional on $\mathscr{H}_1\times \mathscr{H}_2$.
$$
(u_1\otimes u_2)(v_1,v_2)=\langle u_1,v_1\rangle_{\mathscr{H}_1}\langle u_2,v_2\rangle_{\mathscr{H}_2}
$$
Let $\mathscr{V}$ be the set of all finite lienar combination of such conjugate bilinear functionals. We define the inner product on $\mathscr{V}$ by
$$
\langle u\otimes v,u'\otimes v'\rangle=\langle u,u'\rangle_{\mathscr{H}_1}\langle v,v'\rangle_{\mathscr{H}_2}
$$
The infinite-dimensional tensor product of $\mathscr{H}_1$ and $\mathscr{H}_2$ is the completion (extension of those bilinear functionals to make the set closed) of $\mathscr{V}$ with respect to the norm induced by the inner product.
Denoted by $\mathscr{H}_1\otimes \mathscr{H}_2$.
The orthonormal basis of $\mathscr{H}_1\otimes \mathscr{H}_2$ is $\{u_i\otimes v_j:i=1,2,\cdots,j=1,2,\cdots\}$. where $u_i$ is the orthonormal basis of $\mathscr{H}_1$ and $v_j$ is the orthonormal basis of $\mathscr{H}_2$.
### Fock space
#### Definition of Fock space
Let $\mathscr{H}^{\otimes n}$ be the $n$-fold tensor product of $\mathscr{H}$.
Set $\mathscr{H}^{\otimes 0}=\mathbb{C}$.
The Fock space of $\mathscr{H}$ is the direct sum of all $\mathscr{H}^{\otimes n}$.
$$
\mathscr{F}(\mathscr{H})=\bigoplus_{n=0}^\infty \mathscr{H}^{\otimes n}
$$
For example, if $\mathscr{H}=L^2(\mathbb{R},\lambda)$, then an element in $\mathscr{F}(\mathscr{H})$ is a sequence of functions $\psi=(\psi_0,\psi_1(x_1),\psi_2(x_1,x_2),\cdots)$ such that $|\psi_0|^2+\sum_{n=1}^\infty \int|\psi_n(x_1,\cdots,x_n)|^2dx_1\cdots dx_n<\infty$.

View File

@@ -1,461 +0,0 @@
# Math401 Topic 4: The quantum version of probabilistic concepts
> In mathematics, on often speaks of non-commutative instead of quantum constructions.
**Note, in this section, we will see a lot of mixed used terms used in physics and mathematics. I will use _italic_ to denote the terminology used in physics. It is safe to ignore them if you just care about the mathematics.**
## Section 1: Generalities about classical versus quantum systems
In classical physics, we assume that a system's properties have well-defined values regardless of how we choose to measure them.
### Basic terminology
#### Set of states
The preparation of a system builds a convex set of states as our initial condition for the system.
For a collection of $N$ system. Let procedure $N_1=\lambda P_1$ be a preparation procedure for state $P_1$, and $N_2=(1-\lambda) P_2$ be a preparation procedure for state $P_2$. The state of the collection is $N=\lambda N_1+(1-\lambda) N_2$.
#### Set of effects
The set of effects is the set of all possible outcomes of a measurement. $\Omega=\{\omega_1, \omega_2, \ldots, \omega_n\}$. Where each $\omega_i$ is an associated effect, or some query problems regarding the system. (For example, is outcome $\omega_i$ observed?)
#### Registration of outcomes
A pair of state and effect determines a probability $E_i(P)=p(\omega_i|P)$. By the law of large numbers, this probability shall converge to $N(\omega_i)/N$ as $N$ increases.
**Quantum states, _observables_ (random variables), and effects can be represented mathematically by linear operators on a Hilbert space.**
## Section 2: Examples of physical experiment in language of mathematics
### Sten-Gernach experiment
_**State preparation:**_ Silver tams are emitted from a thermal source and collimated to form a beam.
_**Measurement:**_ Silver atoms interact with the field produced by the magnet and impinges on the class plate.
_**Registration:**_ The impression left on the glass pace by the condensed silver atoms.
## Section 3: Finite probability spaces in the language of Hilbert space and operators
> Superposition is a linear combination of two or more states.
A quantum coin can be represented mathematically by linear combination of $|0\rangle$ and $|1\rangle$.$\alpha|0\rangle+\beta|1\rangle\in\mathscr{H}\cong\mathbb{C}^2$.
> For the rest of the material, we shall take the $\mathscr{H}$ to be vector space over $\mathbb{C}$.
### Definitions in classical probability under generalized probability theory
#### Definition of states (classical probability)
A state in classical probability is a probability distribution on the set of all possible outcomes. We can list as $(p_1,p_2,\cdots,p_n)$.
To each event $A\in \Omega$, we associate the operator on $\mathscr{H}$ of multiplication by the indicator function $P_A\coloneqq M_{\mathbb{I}_A}:f\mapsto \mathbb{I}_A f$ that projects onto the subspace of $\mathscr{H}$ corresponding to the event $A$.
$$
P_A=\sum_{k=1}^n a_k|k\rangle\langle k|
$$
where $a_k\in\{0,1\}$, and $a_k=1$ if and only if $k\in A$. Note that $P_A^*=P_A$ and $P_A^2=P_A$.
#### Definition of density operator (classical probability)
Let $(p_1,p_2,\cdots,p_n)$ be a probability distribution on $X$, where $p_k\geq 0$ and $\sum_{k=1}^n p_k=1$. The density operator $\rho$ is defined by
$$
\rho=\sum_{k=1}^n p_k|k\rangle\langle k|
$$
The probability of event $A$ relative to the probability distribution $(p_1,p_2,\cdots,p_n)$ becomes the trace of the product of $\rho$ and $P_A$.
$$
\operatorname{Prob}_\rho(A)\coloneqq\text{Tr}(\rho P_A)
$$
#### Definition of random variables (classical probability)
A random variable is a function $f:X\to\mathbb{R}$. It can also be written in operator form:
$$
F=\sum_{k=1}^n f(k)P_{\{k\}}
$$
The expectation of $f$ relative to the probability distribution $(p_1,p_2,\cdots,p_n)$ is given by
$$
\mathbb{E}_\rho(f)=\sum_{k=1}^n p_k f(k)=\operatorname{Tr}(\rho F)
$$
Note, by our definition of the operator $F,P_A,\rho$ (all diagonal operators) commute among themselves, which does not hold in general, in non-commutative (_quantum_) theory.
## Section 4: Why we need generalized probability theory to study quantum systems
Story of light polarization and violation of Bell's inequality.
### Classical picture of light polarization and Bell's inequality
> An interesting story will be presented here.
#### Polarization of light
The light which comes through a polarizer is polarized in a certain direction. If we fixed the first filter and rotate the second filter, we will observe the intensity of the light will change.
The light intensity decreased with $\alpha$ (the angle between the two filters). The light should vanished when $\alpha=\pi/2$.
![Filter figure](https://notenextra.trance-0.com/Math401/Filter_figure.png)
By experimental measurement, the intensity of the light passing the first filter is half the beam intensity (Assume the original beam is completely unpolarized).
Then $I_1=I_0/2$, and
$$
I_2=I_0\cos^2\alpha
$$
Claim: there exist a smallest package of monochromatic light, which is a photon.
We can model the behavior of each individual photon passing through the filter with direction $\alpha$ with random variable $P_\alpha$. The $P_\alpha(\omega)=1$ if the photon passes through the filter, and $P_\alpha(\omega)=0$ if the photon does not pass through the filter.
Then, the probability of the photon passing through the two filters with direction $\alpha$ and $\beta$ is given by
$$
\mathbb{E}(P_\alpha P_\beta)=\operatorname{Prob}(P_\alpha=1 \text{ and } P_\beta=1)=\frac{1}{2}\cos^2(\alpha-\beta)
$$
However, for system of 3 polarizing filters $F_1,F_2,F_3$, having direction $\alpha_1,\alpha_2,\alpha_3$. If we put them on the optical bench in pairs, Then we will have three random variables $P_1,P_2,P_3$.
#### Bell's 3 variable inequality
$$
\operatorname{Prob}(P_1=1,P_3=0)\leq \operatorname{Prob}(P_1=1,P_2=0)+\operatorname{Prob}(P_2=1,P_3=0)
$$
<details>
<summary>Proof</summary>
By the law of total probability, (The event that the photon passes through the first filter but not the third filter is the union of the event that the photon did not pass through the second filter and the event that the photon passed the second filter and did not pass through the third filter) we have
$$
\begin{aligned}
\operatorname{Prob}(P_1=1,P_3=0)&=\operatorname{Prob}(P_1=1,P_2=0,P_3=0)+\operatorname{Prob}(P_1=1,P_2=1,P_3=0)\\
&=\operatorname{Prob}(P_1=1,P_2=0)\operatorname{Prob}(P_3=0)+\operatorname{Prob}(P_2=1,P_3=0)\operatorname{Prob}(P_1=1)\\
&\leq\operatorname{Prob}(P_1=1,P_2=0)+\operatorname{Prob}(P_2=1,P_3=0)
\end{aligned}
$$
However, according to our experimental measurement, for any pair of polarizers $F_i,F_j$, by the complement rule, we have
$$
\begin{aligned}
\operatorname{Prob}(P_i=1,P_j=0)&=\operatorname{Prob}(P_i=1)-\operatorname{Prob}(P_i=1,P_j=1)\\
&=\frac{1}{2}-\frac{1}{2}\cos^2(\alpha_i-\alpha_j)\\
&=\frac{1}{2}\sin^2(\alpha_i-\alpha_j)
\end{aligned}
$$
This leads to a contradiction if we apply the inequality to the experimental data.
$$
\frac{1}{2}\sin^2(\alpha_1-\alpha_3)\leq\frac{1}{2}\sin^2(\alpha_1-\alpha_2)+\frac{1}{2}\sin^2(\alpha_2-\alpha_3)
$$
If $\alpha_1=0,\alpha_2=\frac{\pi}{6},\alpha_3=\frac{\pi}{3}$, then
$$
\begin{aligned}
\frac{1}{2}\sin^2(-\frac{\pi}{3})&\leq\frac{1}{2}\sin^2(-\frac{\pi}{6})+\frac{1}{2}\sin^2(\frac{\pi}{6}-\frac{\pi}{3})\\
\frac{3}{8}&\leq\frac{1}{8}+\frac{1}{8}\\
\frac{3}{8}&\leq\frac{1}{4}
\end{aligned}
$$
This is a contradiction, so Bell's inequality is violated.
QED
</details>
Other revised experiments (eg. Aspect's experiment, Calcium entangled photon experiment) are also conducted and the inequality is still violated.
#### The true model of light polarization
The full description of the light polarization is given belows:
State of polarization of a photon: $\psi=\alpha|0\rangle+\beta|1\rangle$, where $|0\rangle$ and $|1\rangle$ are the two orthogonal polarization states in $\mathbb{C}^2$.
Polarization filter (generalized 0,1 valued random variable): orthogonal projection $P_\alpha$ on $\mathbb{C}^2$ corresponding to the direction $\alpha$. (operator satisfies $P_\alpha^*=P_\alpha=P_\alpha^2$.)
The matrix representation of $P_\alpha$ is given by
$$
P_\alpha=\begin{pmatrix}
\cos^2(\alpha) & \cos(\alpha)\sin(\alpha)\\
\cos(\alpha)\sin(\alpha) & \sin^2(\alpha)
\end{pmatrix}
$$
Probability of a photon passing through the filter $P_\alpha$ is given by $\langle P_\alpha\psi,\psi\rangle$, this is $\cos^2(\alpha)$ if we set $\psi=|0\rangle$.
Since the probability of a photon passing through the three filters is not commutative, it is impossible to discuss $\operatorname{Prob}(P_1=1,P_3=0)$ in the classical setting.
## Section 5: The non-commutative (_quantum_) probability theory
Let $\mathscr{H}$ be a Hilbert space. $\mathscr{H}$ consists of complex-valued functions on a finite set $\Omega=\{1,2,\cdots,n\}$. and that the functions $(e_1,e_2,\cdots,e_n)$ form an orthonormal basis of $\mathscr{H}$. We use Dirac notation $|k\rangle$ to denote the basis vector $e_k$.
In classical settings, multiplication operators is now be the full space of bounded linear operators on $\mathscr{H}$. (Denoted by $\mathscr{B}(\mathscr{H})$)
Let $A,B\in\mathscr{F}$ be the set of all events in the classical probability settings. $X$ denotes the set of all possible outcomes.
> A orthogonal projection on a Hilbert space is a projection operator satisfying $P^*=P$ and $P^2=P$. We denote the set of all orthogonal projections on $\mathscr{H}$ by $\mathscr{P}$.
>
> This can be found in linear algebra. [Orthogonal projection](https://notenextra.trance-0.com/Math429/Math429_L28#definition-655)
Let $P,Q\in\mathscr{P}$ be the event in non-commutative (_quantum_) probability space. $R(\cdot)$ is the range of the operator. $P^\perp$ is the orthogonal complement of $P$.
| Classical | Classical interpretation | Non-commutative (_Quantum_) | Non-commutative (_Quantum_) interpretation |
| --------- | ------- | -------- | -------- |
| $A\subset B$| Event $A$ is a subset of event $B$ | $P\leq Q$| $R(P)\subseteq R(Q)$ Range of event $P$ is a subset of range of event $Q$ |
| $A\cap B$| Both event $A$ and $B$ happened | $P\land Q$| projection to $R(P)\cap R(Q)$ Range of event $P$ and event $Q$ happened |
| $A\cup B$| Any of the event $A$ or $B$ happened | $P\lor Q$| projection to $R(P)\cup R(Q)$ Range of event $P$ or event $Q$ happened |
| $X\subset A$ or $A^c$| Event $A$ did not happen | $P^\perp$| projection$R(P)^\perp$ Range of event $P$ is the orthogonal complement of range of event $P$ |
In such setting, some rules of classical probability theory are not valid in quantum probability theory.
In classical probability theory, $A\cap(B\cup C)=(A\cap B)\cup(A\cap C)$.
In quantum probability theory, $P\land(Q\lor R)\neq(P\land Q)\lor(P\land R)$ in general.
### Definitions of non-commutative (_quantum_) probability theory under generalized probability theory
#### Definition of states (non-commutative (_quantum_) probability theory)
A state on $(\mathscr{B}(\mathscr{H}),\mathscr{P})$ is a map $\mu:\mathscr{P}\to[0,1]$ such that:
1. $\mu(O)=0$, where $O$ is the zero projection.
2. If $P_1,P_2,\cdots,P_n$ are pairwise disjoint orthogonal projections, then $\mu(P_1\lor P_2\lor\cdots\lor P_n)=\sum_{i=1}^n\mu(P_i)$.
Where projections are disjoint if $P_iP_j=P_jP_i=O$.
#### Definition of density operator (non-commutative (_quantum_) probability theory)
A density operator $\rho$ on the finite-dimensional Hilbert space $\mathscr{H}$ is:
1. self-adjoint ($A^*=A$, that is $\langle Ax,y\rangle=\langle x,Ay\rangle$ for all $x,y\in\mathscr{H}$)
2. positive semi-definite (all eigenvalues are non-negative)
3. $\operatorname{Tr}(\rho)=1$.
If $(|\psi_1\rangle,|\psi_2\rangle,\cdots,|\psi_n\rangle)$ is an orthonormal basis of $\mathscr{H}$ consisting of eigenvectors of $\rho$, for the eigenvalue $p_1,p_2,\cdots,p_n$, then $p_j\geq 0$ and $\sum_{j=1}^n p_j=1$.
We can write $\rho$ as
$$
\rho=\sum_{j=1}^n p_j|\psi_j\rangle\langle\psi_j|
$$
(under basis $|\psi_j\rangle$, it is a diagonal matrix with $p_j$ on the diagonal)
Every basis of $\mathscr{H}$ can be decomposed to these forms.
#### Theorem: Born's rule
Let $\rho$ be a density operator on $\mathscr{H}$. then
$$
\mu(P)\coloneqq\operatorname{Tr}(\rho P)=\sum_{j=1}^n p_j\langle\psi_j|P|\psi_j\rangle
$$
Defines a probability measure on the space $\mathscr{P}$.
[Proof ignored here]
#### Theorem: Gleason's theorem (very important)
Let $\mathscr{H}$ be a Hilbert space over $\mathbb{C}$ or $\mathbb{R}$ of dimension $n\geq 3$. Let $\mu$ be a state on the space $\mathscr{P}$ of projections on $\mathscr{H}$. Then there exists a unique density operator $\rho$ such that
$$
\mu(P)=\operatorname{Tr}(\rho P)
$$
for all $P\in\mathscr{P}$. $\mathscr{P}$ is the space of all orthogonal projections on $\mathscr{H}$.
[Proof ignored here]
#### Definition of random variable _or Observables_ (non-commutative (_quantum_) probability theory)
_It is the physical measurement of a system that we are interested in. (kinetic energy, position, momentum, etc.)_
$\mathscr{B}(\mathbb{R})$ is the set of all Borel sets on $\mathbb{R}$.
An random variable on the Hilbert space $\mathscr{H}$ is a projection valued map $P:\mathscr{B}(\mathbb{R})\to\mathscr{P}$.
With the following properties:
1. $P(\emptyset)=O$ (the zero projection)
2. $P(\mathbb{R})=I$ (the identity projection)
3. For any sequence $A_1,A_2,\cdots,A_n\in \mathscr{B}(\mathbb{R})$. the following holds:
(a) $P(\bigcup_{i=1}^n A_i)=\bigvee_{i=1}^n P(A_i)$
(b) $P(\bigcap_{i=1}^n A_i)=\bigwedge_{i=1}^n P(A_i)$
(c) $P(A^c)=I-P(A)$
(d) If $A_j$ are mutually disjoint (that is $P(A_i)P(A_j)=P(A_j)P(A_i)=O$ for $i\neq j$), then $P(\bigcup_{j=1}^n A_j)=\sum_{j=1}^n P(A_j)$
#### Definition of probability of a random variable
For a system prepared in state $\rho$, the probability of the random variable by the projection-valued measure $P$ is in the Borel set $A$ is $\operatorname{Tr}(\rho P(A))$.
### Expectation of an random variable and projective measurement
Notice that if we set $\lambda$ is _observed_ with probability $p_\lambda=\operatorname{Tr}(\rho P_\lambda)$. $\rho'\coloneqq\sum_{\lambda\in sp(T)}P_\lambda \rho P_\lambda$ is a density operator.
#### Definition of expectation of operators
Let $T$ be a self-adjoint operator on $\mathscr{H}$. The expectation of $T$ relative to the density operator $\rho$ is given by
$$
\mathbb{E}_\rho(T)=\operatorname{Tr}(\rho T)=\sum_{\lambda\in sp(T)}\lambda \operatorname{Tr}(\rho P(\lambda))
$$
if we set $T=\sum_{\lambda\in sp(T)}\lambda P_\lambda$, then $\mathbb{E}_\rho(T)=\sum_{\lambda\in sp(T)}\lambda \operatorname{Tr}(\rho P(\lambda))$.
### The uncertainty principle
Let $A,B$ be two self-adjoint operators on $\mathscr{H}$. Then we define the following two self-adjoint operators:
$$
i[A,B]\coloneqq i(AB-BA)
$$
$$
A\circ B\coloneqq \frac{AB+BA}{2}
$$
Note that $A\circ B$ satisfies Jordan's identity.
$$
(A\circ B)\circ (A\circ A)=A\circ (B\circ (A\circ A))
$$
#### Definition of variance
Given a state $\rho$, the variance of $A$ is given by
$$
\operatorname{Var}_\rho(A)\coloneqq\mathbb{E}_\rho(A^2)-\mathbb{E}_\rho(A)^2=\operatorname{Tr}(\rho A^2)-\operatorname{Tr}(\rho A)^2
$$
#### Definition of covariance
Given a state $\rho$, the covariance of $A$ and $B$ is given by the Jordan product of $A$ and $B$.
$$
\operatorname{Cov}_\rho(A,B)\coloneqq\mathbb{E}_\rho(A\circ B)-\mathbb{E}_\rho(A)\mathbb{E}_\rho(B)=\operatorname{Tr}(\rho A\circ B)-\operatorname{Tr}(\rho A)\operatorname{Tr}(\rho B)
$$
#### Robertson-Schrödinger uncertainty relation in finite dimensional Hilbert space
Let $\rho$ be a state on $\mathscr{H}$, $A,B$ be two self-adjoint operators on $\mathscr{H}$. Then
$$
\operatorname{Var}_\rho(A)\operatorname{Var}_\rho(B)\geq\operatorname{Cov}_\rho(A,B)^2+\frac{1}{4}|\mathbb{E}_\rho([A,B])|^2
$$
If $\rho$ is a pure state ($\rho=|\psi\rangle\langle\psi|$ for some unit vector $|\psi\rangle\in\mathscr{H}$), and the equality holds, then $A$ and $B$ are collinear (i.e. $A=c B$ for some constant $c\in\mathbb{R}$).
When $A$ and $B$ commute, the classical inequality is recovered (Cauchy-Schwarz inequality).
$$
\operatorname{Var}_\rho(A)\operatorname{Var}_\rho(B)\geq\operatorname{Cov}_\rho(A,B)^2
$$
[Proof ignored here]
### The uncertainty relation for unbounded symmetric operators
#### Definition of symmetric operator
An operator $A$ is symmetric if for all $\phi,\psi\in\mathscr{H}$, we have
$$
\langle A\phi,\psi\rangle=\langle\phi,A\psi\rangle
$$
An example of symmetric operator is $T(\psi)=i\hbar\frac{d\psi}{dx}$. If we let $\mathscr{H}=\mathscr{D}(T)$, $\hbar$ is the Planck constant.
$\mathscr{D}(T)$ be the space of all square integrable, differentiable, and it's derivative is also square integrable functions on $\mathbb{R}$.
#### Definition of joint domain of two operators
Let $(A,\mathscr{D}(A)),(B,\mathscr{D}(B))$ be two symmetric operators on their corresponding domains. The domain of $AB$ is defined as
$$
\mathscr{D}(AB)\coloneqq\{\psi\in\mathscr{D}(B):B\psi\in\mathscr{D}(A)\}
$$
Since $(AB)\psi=A(B\psi)$, the variance of an operator $A$ relative to a pure state $\rho=|\psi\rangle\langle\psi|$ is given by
$$
\operatorname{Var}_\rho(A)=\operatorname{Tr}(\rho A^2)-\operatorname{Tr}(\rho A)^2=\langle\psi,A^2\psi\rangle-\langle\psi,A\psi\rangle^2
$$
If $A$ is symmetric, then $\operatorname{Var}_\rho(A)=\langle A\psi,A\psi\rangle-\langle \psi, A\psi\rangle^2$.
#### Robertson-Schrödinger uncertainty relation for unbounded symmetric operators
Let $(A,\mathscr{D}(A)),(B,\mathscr{D}(B))$ be two symmetric operators on their corresponding domains. Then
$$
\operatorname{Var}_\rho(A)\operatorname{Var}_\rho(B)\geq\operatorname{Cov}_\rho(A,B)^2+\frac{1}{4}|\mathbb{E}_\rho([A,B])|^2
$$
If $\rho$ is a pure state ($\rho=|\psi\rangle\langle\psi|$ for some unit vector $|\psi\rangle\in\mathscr{H}$), and the equality holds, then $A\psi$ and $B\psi$ are collinear (i.e. $A\psi=c B\psi$ for some constant $c\in\mathbb{R}$).
[Proof ignored here]
### Summary of analog of classical probability theory and non-commutative (_quantum_) probability theory
| Classical probability | Non-commutative (_Quantum_) probability |
| --------- | ------- |
| Sample space $\Omega$, cardinality $\vert\Omega\vert=n$, example: $\Omega=\{0,1\}$ | Complex Hilbert space $\mathscr{H}$, dimension $\dim\mathscr{H}=n$, example: $\mathscr{H}=\mathbb{C}^2$ |
|Common algebra of $\mathbb{C}$ valued functions| Algebra of bounded operators $\mathscr{B}(\mathscr{H})$|
|$f\mapsto \bar{f}$ complex conjugation| $P\mapsto P^*$ adjoint|
|Events: indicator functions of sets| Projections: space of orthogonal projections $\mathscr{P}\subseteq\mathscr{B}(\mathscr{H})$|
|functions $f$ such that $f^2=f=\overline{f}$| orthogonal projections $P$ such that $P^*=P=P^2$|
|$\mathbb{R}$-valued functions $f=\overline{f}$| self-adjoint operators $A=A^*$|
| $\mathbb{I}_{f^{-1}(\{\lambda\})}$ is the indicator function of the set $f^{-1}(\{\lambda\})$| $P(\lambda)$ is the orthogonal projection to eigenspace|
|$f=\sum_{\lambda\in \operatorname{Range}(f)}\lambda \mathbb{I}_{f^{-1}(\{\lambda\})}$|$A=\sum_{\lambda\in \operatorname{sp}(A)}\lambda P(\lambda)$|
|Probability measure $\mu$ on $\Omega$| Density operator $\rho$ on $\mathscr{H}$|
|Delta measure $\delta_\omega$| Pure state $\rho=\vert\psi\rangle\langle\psi\vert$|
|$\mu$ is non-negative measure and $\sum_{i=1}^n\mu(\{i\})=1$| $\rho$ is positive semi-definite and $\operatorname{Tr}(\rho)=1$|
|Expected value of random variable $f$ is $\mathbb{E}_{\mu}(f)=\sum_{i=1}^n f(i)\mu(\{i\})$| Expected value of operator $A$ is $\mathbb{E}_\rho(A)=\operatorname{Tr}(\rho A)$|
|Variance of random variable $f$ is $\operatorname{Var}_\mu(f)=\sum_{i=1}^n (f(i)-\mathbb{E}_\mu(f))^2\mu(\{i\})$| Variance of operator $A$ is $\operatorname{Var}_\rho(A)=\operatorname{Tr}(\rho A^2)-\operatorname{Tr}(\rho A)^2$|
|Covariance of random variables $f$ and $g$ is $\operatorname{Cov}_\mu(f,g)=\sum_{i=1}^n (f(i)-\mathbb{E}_\mu(f))(g(i)-\mathbb{E}_\mu(g))\mu(\{i\})$| Covariance of operators $A$ and $B$ is $\operatorname{Cov}_\rho(A,B)=\operatorname{Tr}(\rho A\circ B)-\operatorname{Tr}(\rho A)\operatorname{Tr}(\rho B)$|
|Composite system is given by Cartesian product of the sample spaces $\Omega_1\times\Omega_2$| Composite system is given by tensor product of the Hilbert spaces $\mathscr{H}_1\otimes\mathscr{H}_2$|
|Product measure $\mu_1\times\mu_2$ on $\Omega_1\times\Omega_2$| Tensor product of space $\rho_1\otimes\rho_2$ on $\mathscr{H}_1\otimes\mathscr{H}_2$|
|Marginal distribution $\pi_*v$|Partial trace $\operatorname{Tr}_2(\rho)$|
### States of two dimensional system and the complex projective space (_Bloch sphere_)
Let $v=e^{i\theta}u$, then the space of pure states ($\rho=|u\rangle\langle u|$) is the complex projective space $\mathbb{C}P^1$.
$\alpha=x_i+iy_i,\beta=x_2+iy_2$ must satisfy $|\alpha|^2+|\beta|^2=1$, that is $x_1^2+x_2^2+y_1^2+y_2^2=1$.
The set of unit vectors in $\mathbb{C}^2$ is the unit sphere in $\mathbb{R}^3$.
So the space of pure states is the unit circle in $\mathbb{R}^2$.
#### Mapping between the space of pure states and the complex projective space
Any two dimensional pure state can be written as $e^{i\theta}u$, where $u$ is a unit vector in $\mathbb{R}^2$. There exists a bijective map $P:S^2\to\mathscr{P}_1\subseteq M_2(\mathbb{C})$ such that $P(u)=|u\rangle\langle u|$.
$$
P(\vec{x})=\frac{1}{2}(I+\vec{a}\cdot\vec{\sigma})=\frac{1}{2}\begin{pmatrix}
1&0\\
0&1
\end{pmatrix}+\frac{a_x}{2}\begin{pmatrix}
0&1\\
1&0
\end{pmatrix}+\frac{a_y}{2}\begin{pmatrix}
0&-i\\
i&0
\end{pmatrix}+\frac{a_z}{2}\begin{pmatrix}
1&0\\
0&-1
\end{pmatrix}

View File

@@ -1,203 +0,0 @@
# Math401 Topic 5: Introducing dynamics: classical and non-commutative
## Section 1: Dynamics in classical probability
### Basic definitions
#### Definition of orbit
Let $T:\Omega\to\Omega$ be a map (may not be invertible) generating a dynamical system on $\Omega$. Given $\omega\in \Omega$, the (forward) orbit of $\omega$ is the set $\mathscr{O}(\omega)=\{T^n(\omega)\}_{n\in\mathbb{Z}}$.
The theory of dynamics is the study of properties of orbits.
#### Definition of measure-preserving map
Let $P$ be a probability measure on a $\sigma$-algebra $\mathscr{F}$ of subsets of $\Omega$. (that is, $P:\mathscr{F}\to$ anything) A measurable transformation $T:\Omega\to\Omega$ is said to be measure-preserving if for all random variables $\psi:\Omega\to\mathbb{R}$, we have $\mathbb{E}(\psi\circ T)=\mathbb{E}(\psi)$, that is:
$$
\int_\Omega (\psi\circ T)(\omega)dP(\omega)=\int_\Omega \psi(\omega)dP(\omega)
$$
Example:
The doubling map $T:\Omega\to\Omega$ is defined as $T(x)=2x\mod 1$, is a Lebesgue measure preserving map on $\Omega=[0,1]$.
#### Definition of isometry
The composition operator $\psi\mapsto U\psi=\psi\circ T$, where $T$ is a measure preserving map defined on $\mathscr{H}=L^2(\Omega,\mathscr{F},P)$ is isometry of $\mathscr{H}$ if $\langle U\psi,U\phi\rangle=\langle\psi,\phi\rangle$ for all $\psi,\phi\in\mathscr{H}$.
#### Definition of unitary
The composition operator $\psi\mapsto U\psi=\psi\circ T$, where $T$ is a measure preserving map defined on $\mathscr{H}=L^2(\Omega,\mathscr{F},P)$ is unitary of $\mathscr{H}$ if $U$ is an isometry and $T$ is invertible with measurable inverse.
## Section 2: Continuous time (classical) dynamical systems
### Spring-mass system
![Spring-mass system](https://notenextra.trance-0.com/Math401/Spring-mass_system.png)
The pure state of the system is given by the position and velocity of the mass. $(x,v)$ is a point in $\mathbb{R}^2$. $\mathbb{R}^2$ is the state space of the system. (or phase space)
The motion of the system in its state space is a closed curve.
$$
\Phi_t(x,v)=\left(\cos(\omega t)x-\frac{1}{\omega}\sin(\omega t)v, \cos(\omega t)v-\omega\sin(\omega t)x\right)
$$
Such system with closed curve is called **integrable system**. Where the doubling map produces orbits having distinct dynamical properties (**chaotic system**).
> Note, some section is intentionally ignored here. They are about in the setting of operators on Hilbert spaces, the evolution of (classical, non-dissipative e.g. linear spring-mass system) system, is implemented by a one-parameter group of unitary operators.
>
> The detailed construction is omitted here.
#### Definition of Hermitian operator
A linear operator $A$ on a Hilbert space $\mathscr{H}$ is said to be Hermitian if $\forall \psi,\phi\in$ **domain of $A$**, we have $\langle A\psi,\phi\rangle=\langle\psi,A\phi\rangle$.
It is skew-Hermitian if $\langle A\psi,\phi\rangle=-\langle\psi,A\phi\rangle$.
## Section 3: Hamiltonians and the Schrödinger equation (finite dimensional version)
the problem of solving Schrödinger equation is at its core about studying the spectral theory of the Hamiltonian operator.
### Dynamics in 2-dimensional (_2 level_) systems (qubit)
In previous sections, we know that any self-adjoint matrix has the form $x_0+\vec{x}\cdot \sigma$, where $\sigma$ is the Pauli matrices.
And $(x_0,\vec{x})\in\mathbb{R}^4$ is a point in $\mathbb{R}^4$.
The general form (time-independent) of the Hamiltonian for a 2-level system is:
$$
H=\begin{pmatrix}
x_0+x_3 & x_1-ix_2 \\
x_1+ix_2 & -x_0+x_3
\end{pmatrix}
$$
Parameterizing the curves in Bloch space generated by Hamiltonian. In physical dimension of $\vec{x}=\omega\hbar\vec{s}$, $\omega>0$. $\omega\hbar$ is the physical dimension of energy.
we have:
$$
H=\omega\hbar\begin{pmatrix}
s_3 & s_1-is_2 \\
s_1+is_2 & -s_3
\end{pmatrix}
$$
[Continue on the orbits of states in the Bloch sphere] skip for now.
## Section 4: Transition probability, probability amplitudes and the Born rule
the modulus squared of a probability amplitude is the probability of the corresponding state.
### Basic definitions in transition probability
#### Definition of probability amplitude
For a n-dimensional Hilbert space $\mathscr{H}$, the system is initially in a pure state give by the unit vector $|\psi_0\rangle\in\mathscr{H}$, thus with the density operator $\rho_0=|\psi_0\rangle\langle\psi_0|$.
Then the state at time $t_1$ is given by $|\psi_1\rangle=A|\psi_0\rangle$, where $A\in U(n)$ is a unitary operator.
Then the density operator at time $t_1$ is given by $\rho_1=|\psi_1\rangle\langle\psi_1|=A|\psi_0\rangle\langle\psi_0|A^*=A\rho_0A^*$.
The entry of $A$ are $a_{ij}=\langle i|A|j\rangle$. where $|i\rangle$ is the basis of $\mathscr{H}$.
The $a_{ij}$ are the probability amplitudes of the transition from state $|i\rangle$ to state $|j\rangle$.
#### Definition of transition probability
Given above, the transition probability from state $|i\rangle$ to state $|j\rangle$ is given by:
$$
|a_{ij}|^2
$$
#### Sum over paths
To each path of classical states, path $j\to i: i_0=j,i_1,i_2,\cdots,i_l=i$, we associates the probability amplitude of the path given by:
$$
|\text{path}(j\to i)\rangle=\langle i_0|i_1\rangle\langle i_1|i_2\rangle\cdots\langle i_{l-1}|i_l\rangle
$$
The probability of the path is given by:
$$
\operatorname{Prob}(i|j)=\left|\sum_{\text{all paths}j\to i \text{ with } l \text{ steps}}|\text{path}(j\to i)\rangle\right|^2
$$
### Measuring a qubit
#### Definition of qubit
A qubit is a 2-level quantum system.
One example of qubit is the photon polarization.
#### Measurement of a qubit
The measurement of a qubit is a map fro the space of density operators, to a point on the intervals $[0,1]$.
This gives a probability distribution on the interval $[0,1]$ in our classical probability space.
![Measurement of a qubit](https://notenextra.trance-0.com/Math401/Measurement_of_a_qubit.png)
Here $p=\cos^2(\theta)\in[0,1]$. is the probability of the state being in the state $|0\rangle$.
The north pole on the Bloch sphere gives probability $1$ for the state being in the state $|0\rangle$.
The south pole on the Bloch sphere gives probability $1$ for the state being in the state $|1\rangle$.
The equator on the Bloch sphere gives probability $1/2$ for the state being in the state $|0\rangle$ or $|1\rangle$.
### Projective measurement of an $N$-qubit system
For $N$ qubits, the pure quantum state $\rho=|\psi\rangle\langle\psi|$ represented by the state vector $|\psi\rangle\in\mathscr{H}^{\otimes N}=\mathscr{H}\otimes\cdots\otimes\mathscr{H}(\mathscr{H}=\mathbb{C}^2)$.
This produces as output the random variable $X\in \{0,1\}^N$. $X=(a_1,a_2,\cdots,a_N)$, where $a_i\in \{0,1\}$.
By the Born rule,
$$
\operatorname{Prob}(X=(a_1,a_2,\cdots,a_N))=\left|\langle a_1a_2\cdots a_N|\psi\rangle\right|^2
$$
where $\langle a_1a_2\cdots a_N|\psi\rangle=\langle a_1|\otimes\langle a_2|\otimes\cdots\otimes\langle a_N|\psi\rangle$.
The input vector state $|\psi\rangle$ is a unit vector in $\mathscr{H}^{\otimes N}$.
This can be written as a tensor product of the basis vectors:
$$
|\psi\rangle=\sum_{a_1,a_2,\cdots,a_N} c_{a_1,a_2,\cdots,a_N}|a_1a_2\cdots a_N\rangle
$$
where $c_{a_1,a_2,\cdots,a_N}\in\mathbb{C}$.
The probability distribution of the post-measurement **classical random variable** $X$ can be represented as a point in the $2^N-1$ dimensional simplex of all probability distributions on the set $\{0,1\}^N$.
$$
\mathscr{P}(\{0,1\}^N)=\left\{(p_1,p_2,\cdots,p_{2^N})\in\mathbb{R}^{2^N}:p_i\geq 0,\sum_{i=1}^{2^N}p_i=1\right\}
$$
![Simplex of all probability distributions on the set $\{0,1\}^N$](https://notenextra.trance-0.com/Math401/Simplex_of_all_probability_distributions_on_the_set_01N.png)
here we use the binary representation for the index $i$ in the diagram.
#### Pure versus mixed states
A pure state is a state that is represented by a unit vector in $\mathscr{H}^{\otimes N}$.
A mixed state is a state that is represented by a density operator in $\mathscr{H}^{\otimes N}$. (convex combination of pure states)
if $\rho_j=|\psi_j\rangle\langle\psi_j|$, then $\rho=\sum_{j=1}^N p_j\rho_j$ is a mixed state, where $p_j\geq 0$ and $\sum_{j=1}^N p_j=1$.
#### Projective measurement of subsystem and partial trace
This section is related to quantum random walk and we will skip it for now.
## Section 5: Quantum random walk
This part is skipped, it is an interesting topic, but it is not the focus of my research for now.

View File

@@ -1,639 +0,0 @@
# Math401 Topic 6: Postulates of quantum theory and measurement operations
## Section 1: Postulates of quantum theory
This part is a review of the quantum theory, I will keep the content brief.
If you are familiar with the linear algebra defined before, you can jump right into this section to keep your time as viewing those compact notations.
### Pure states
#### Pure state and mixed state
A pure state is a state that is represented by a unit vector in $\mathscr{H}^{\otimes N}$.
A mixed state is a state that is represented by a density operator in $\mathscr{H}^{\otimes N}$. (convex combination of pure states)
if $\rho_j=|\psi_j\rangle\langle\psi_j|$, then $\rho=\sum_{j=1}^N p_j\rho_j$ is a mixed state, where $p_j\geq 0$ and $\sum_{j=1}^N p_j=1$.
#### Coset space
Two non-zero vectors $u,v\in \mathscr{H}$ are said to represent the same state if $u=cv$ for some complex number $c$ with $|c|=1$.
The set of states of a quantum system is called the **coset space** of $\mathscr{H}$, $u\sim v$ if $u=cv$ for some complex number $c$ with $|c|=1$.
The coset space is called the projective space of $\mathscr{H}$, denoted by $P(\mathscr{H})\colon=(\mathscr{H}\setminus\{0\})/\sim$.
Any vector in the form $e^{i\theta}|u\rangle$ for some $u\in \mathscr{H}$ and $\theta\in \mathbb{R}$ represents the same state as $|u\rangle$.
Example: the system of a qubit has a Hilbert space $\mathbb{C}^2$, the coset space is $P(\mathbb{C}^2)\cong S^2$ is the Bloch sphere.
### Composite systems
#### Tensor product
The tensor product of two Hilbert spaces $\mathscr{H}_1$ and $\mathscr{H}_2$ is the Hilbert space $\mathscr{H}_1\otimes\mathscr{H}_2$ with the inner product $\langle u_1\otimes u_2,v_1\otimes v_2\rangle=\langle u_1,v_1\rangle\langle u_2,v_2\rangle$.
The tensor product of two vectors $u_1\in \mathscr{H}_1$ and $u_2\in \mathscr{H}_2$ is the vector $u_1\otimes u_2\in \mathscr{H}_1\otimes\mathscr{H}_2$.
#### Multipartite systems
For each part in a multipartite quantum system, each part is associated a Hilbert space $\mathscr{H}_i$. The total system is associated a Hilbert space $\mathscr{H}=\mathscr{H}_1\otimes\mathscr{H}_2\otimes\cdots\otimes\mathscr{H}_n$.
The state of the total system has the form $u_1\otimes u_2\otimes\cdots\otimes u_n$ for some $u_i\in \mathscr{H}_i$.
#### Entanglement (talk later)
A state $|\psi\rangle$ is entangled if it cannot be expressed as a product state $v_1\otimes v_2$ for any single-qubit states $|v_1\rangle$ and $|v_2\rangle$. In other words, an entangled state is non-separable.
Example: the Bell state $|\psi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)$ is entangled.
Assume it can be written as $|\psi\rangle=|\psi_1\rangle\otimes|\psi_2\rangle$ where $|\psi_1\rangle=a|0\rangle+b|1\rangle$ and $|\psi_2\rangle=c|0\rangle+d|1\rangle$. Then:
$$
|\psi\rangle=a|00\rangle+b|01\rangle+c|10\rangle+d|11\rangle
$$
Setting this equal to $|\psi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)$ gives:
$$
ac|00\rangle+ad|01\rangle+bc|10\rangle+bd|11\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)
$$
This requires:
$$
ac=bd=\frac{1}{2}
$$
$$
ad=bc=0
$$
This is a contradiction, so $|\psi^+\rangle$ is entangled.
### Mixed states and density operators
#### Density operator
A density operator is a [Hermitian](https://notenextra.trance-0.com/Math401/Math401_T5#definition-of-hermitian-operator), positive semi-definite operator with trace 1.
The density operator of a pure state $|\psi\rangle$ is $\rho=|\psi\rangle\langle\psi|$.
The density operator of a mixed state is given by the unit vector $u_1,u_2,\cdots,u_n$ in $\mathscr{H}$ with the probability $p_1,p_2,\cdots,p_n$, $p_i\geq 0$ such that $\sum_{i=1}^n p_i=1$.
The density operator is $\rho=\sum_{i=1}^n p_i|u_i\rangle\langle u_i|$.
#### Trace 1 proposition
Density operator on the finite dimensional Hilbert space $\mathscr{H}$ are positive operators having trace equal to 1.
#### Pure state lemma
A state is pure if and only if $Tr(\rho^2)=1$.
For any mixed state $\rho$, $Tr(\rho^2)<1$.
[Proof ignored here]
#### Unitary freedom in the ensemble for density operators theorem
Let $v_1,v_2,\cdots,v_l$ and $w_1,w_2,\cdots,w_l$ be two collections of vectors in the finite dimensional Hilbert space $\mathscr{H}$, the vectors being arbitrary (can be zero) except for the requirement that they define the same density operator $\rho$.
$$
\sum_{i=1}^l |v_i\rangle\langle v_i|=\sum_{i=1}^l |w_i\rangle\langle w_i|
$$
Then there exists a unitary matrix $U=(\mu_{ij})_{1\leq i,j\leq l}$ such that:
$$
v_i=\sum_{j=1}^l \mu_{ij}w_j
$$
The converse is also true.
If $\rho$ is a density operator on $\mathscr{H}$ given by: $\sum_{i=1}^l |w_i\rangle\langle w_i|$ and vector $v_i$ is given by: $v_i=\sum_{j=1}^l \mu_{ij}w_j$, then $\rho_1=\sum_{i=1}^l |v_i\rangle\langle v_i|$ is the density operator of the subsystem $\mathscr{H}_1$.
[Proof ignored here]
### Density operator of subsystems
#### Partial trace for density operators
Let $\rho$ be a density operator in $\mathscr{H}_1\otimes\mathscr{H}_2$, the partial trace of $\rho$ over $\mathscr{H}_2$ is the density operator in $\mathscr{H}_1$ (reduced density operator for the subsystem $\mathscr{H}_1$) given by:
$$
\rho_1\coloneqq\operatorname{Tr}_2(\rho)=\sum_{k=1}^r \lambda_k^2|v_k\rangle\langle v_k|
$$
<details>
<summary>Examples</summary>
Let $\rho=\frac{1}{\sqrt{2}}(|01\rangle+|10\rangle)$ be a density operator on $\mathscr{H}=\mathbb{C}^2\otimes \mathbb{C}^2$.
Expand the expression of $\rho$ in the basis of $\mathbb{C}^2\otimes\mathbb{C}^2$ using linear combination of basis vectors:
$$
\rho=\frac{1}{2}(|01\rangle\langle 01|+|01\rangle\langle 10|+|10\rangle\langle 01|+|10\rangle\langle 10|)
$$
Note $\operatorname{Tr}_2(|ab\rangle\langle cd|)=|a\rangle\langle c|\cdot \langle b|d\rangle$.
Then the reduced density operator of the subsystem $\mathbb{C}^2$ in first qubit is, note the $\langle 0|0\rangle=\langle 1|1\rangle=1$ and $\langle 0|1\rangle=\langle 1|0\rangle=0$:
$$
\begin{aligned}
\rho_1&=\operatorname{Tr}_2(\rho)\\
&=\frac{1}{2}(\langle 1|1\rangle |0\rangle\langle 0|+\langle 0|1\rangle |0\rangle\langle 1|+\langle 1|0\rangle |1\rangle\langle 0|+\langle 0|0\rangle |1\rangle\langle 1|)\\
&=\frac{1}{2}(|0\rangle\langle 0|+|1\rangle\langle 1|)\\
&=\frac{1}{2}I
\end{aligned}
$$
is a mixed state.
</details>
#### Schmidt Decomposition theorem
Let $|u\rangle\in \mathscr{H}_1\otimes\mathscr{H}_2$ be a unit vector (pure state), then there exists orthonormal bases $|v_i\rangle$ of $\mathscr{H}_1$ and $|w_j\rangle$ of $\mathscr{H}_2$ and $\{\lambda_k\},k\leq r$, where $r$ is the Schmidt rank of $|u\rangle$, such that:
$$
|u\rangle=\sum_{k=1}^r \lambda_k|v_k\rangle\otimes|w_k\rangle
$$
where $\lambda_k$ are **non-negative real numbers**. such that $\sum_{k=1}^r \lambda_k^2=1$.
[Proof ignored here]
**Remark**: non-zero vector $u\in \mathscr{H}_1\otimes\mathscr{H}_2$ decomposes as a tensor product $u=u_1\otimes u_2$ if and only if the Schmidt rank of $u$ is 1. **A state** that cannot be decomposed as a tensor product is called **entangled**.
#### Reduced density operator
In $\mathscr{H}_1\otimes\mathscr{H}_2$, the reduced density operator of the subsystem $\mathscr{H}_1$ is:
$$
\rho_1=\operatorname{Tr}_2(\rho)=\sum_{k=1}^r \lambda_k^2|v_k\rangle\langle v_k|
$$
where $\rho$ is the density operator in $\mathscr{H}_1\otimes\mathscr{H}_2$.
Example:
Let $\rho=\frac{1}{2}(|01\rangle+|10\rangle)\in \mathbb{C}^2\otimes\mathbb{C}^2$,
Expand the expression of $\rho$ in the basis of $\mathbb{C}^2\otimes\mathbb{C}^2$:
$$
\rho=\frac{1}{2}(|01\rangle\langle 01|+|01\rangle\langle 10|+|10\rangle\langle 01|+|10\rangle\langle 10|)
$$
then the reduced density operator of the subsystem $\mathbb{C}^2$ in first qubit is:
$$
\begin{aligned}
\rho_1&=\operatorname{Tr}_2(\rho)\\
&=\frac{1}{2}(\langle 1|1\rangle|0\rangle\langle 0|+\langle 1|0\rangle|0\rangle\langle 1|+\langle 0|1\rangle|1\rangle\langle 0|+\langle 0|0\rangle|1\rangle\langle 1|)\\
&=\frac{1}{2}(|0\rangle\langle 0|+|1\rangle\langle 1|)\\
&=\frac{1}{2}I
\end{aligned}
$$
### State purification
Every mixed state can be derived as the reduction of a pure state on an enlarged Hilbert space.
#### State purification theorem
Let $\rho$ be a mixed state in a finite dimensional Hilbert space $\mathscr{H}$, then there exists a unit vector $|w\rangle\in \mathscr{H}\otimes\mathscr{H}$ such that:
$$
\rho=\operatorname{Tr}_2(|w\rangle\langle w|)
$$
Hint of proof:
Let $u_1,u_2,\cdots,u_d$ be an orthonormal basis of $\mathscr{H}$, $\sum_{i=1}^d p_i=1$, $p_i\geq 0$, then:
$$
\rho=\sum_{i=1}^d p_i|u_i\rangle\langle u_i|
$$
Let $w=\sum_{i=1}^d \sqrt{p_i}u_i\otimes u_i$.
### Observables
The observables in the quantum theory are self-adjoint operators on the Hilbert space $\mathscr{H}$, denoted by $A\in \mathscr{O}$
In finite dimensional Hilbert space, $A$ can be written as $\sum_{\lambda\in \operatorname{sp}{(A)}}\lambda P_\lambda$, where $P_\lambda$ is the projection operator onto the eigenspace of $A$ corresponding to the eigenvalue $\lambda$. $P_\lambda=P_\lambda^2=P_\lambda^*$.
### Effects and Busch's theorem for effect operators
Below is a section on Topic 4, about Gleason's theorem and definition of states, and Born's rule for describing the states using density operators.
#### Definition of states (non-commutative (_quantum_) probability theory)
> Do a double check on this section, this notation is slightly different from the one in Topic 4.
A state on $(\mathscr{B}(\mathscr{H}),\mathscr{P})$ is a map $\mu:\mathscr{P}\to[0,1]$ such that:
1. $0\leq \mu(E)\leq 1$ for all $E\in \mathscr{P}(\mathscr{H})$.
2. $\mu(I_{\mathscr{H}})=1$.
3. If $E_1,E_2,\cdots,E_n$ are pairwise disjoint orthogonal projections, whose sum is also in $\mathscr{P}(\mathscr{H})$ then $\mu(E_1\lor E_2\lor\cdots\lor E_n)=\sum_{i=1}^n\mu(E_i)$.
Where projections are disjoint if $P_iP_j=P_jP_i=O$.
#### Definition of density operator (non-commutative (_quantum_) probability theory)
A density operator $\rho$ on the finite-dimensional Hilbert space $\mathscr{H}$ is:
1. self-adjoint ($A^*=A$, that is $\langle Ax,y\rangle=\langle x,Ay\rangle$ for all $x,y\in\mathscr{H}$)
2. positive semi-definite (all eigenvalues are non-negative)
3. $\operatorname{Tr}(\rho)=1$.
If $(|\psi_1\rangle,|\psi_2\rangle,\cdots,|\psi_n\rangle)$ is an orthonormal basis of $\mathscr{H}$ consisting of eigenvectors of $\rho$, for the eigenvalue $p_1,p_2,\cdots,p_n$, then $p_j\geq 0$ and $\sum_{j=1}^n p_j=1$.
We can write $\rho$ as
$$
\rho=\sum_{j=1}^n p_j|\psi_j\rangle\langle\psi_j|
$$
(under basis $|\psi_j\rangle$, it is a diagonal matrix with $p_j$ on the diagonal)
Every basis of $\mathscr{H}$ can be decomposed to these forms.
#### Theorem: Born's rule
Let $\rho$ be a density operator on $\mathscr{H}$. then
$$
\mu(P)\coloneqq\operatorname{Tr}(\rho P)=\sum_{j=1}^n p_j\langle\psi_j|P|\psi_j\rangle
$$
Defines a probability measure on the space $\mathscr{P}$.
[Proof ignored here]
#### Theorem: Gleason's theorem (very important)
Let $\mathscr{H}$ be a Hilbert space over $\mathbb{C}$ or $\mathbb{R}$ of dimension $n\geq 3$. Let $\mu$ be a state on the space $\mathscr{P}(\mathscr{H})$ of projections on $\mathscr{H}$. Then there exists a unique density operator $\rho$ such that
$$
\mu(P)=\operatorname{Tr}(\rho P)
$$
for all $P\in\mathscr{P}(\mathscr{H})$. $\mathscr{P}(\mathscr{H})$ is the space of all orthogonal projections on $\mathscr{H}$.
[Proof ignored here]
Extending the experimental procedure in quantum physics, **many of the outcome probabilities are expectation of effects instead of projections.** (POVMs)
#### Definition of effect
An effect is a positive (self-adjoint) operator $E$ on $\mathscr{H}$ such that $0\leq E\leq I$.
The set of effects on $\mathscr{H}$ is denoted by $\mathscr{E}(\mathscr{H})$.
An operator $E$ is said to be the **extreme point** of the convex set $\mathscr{E}(\mathscr{H})$ if it cannot be written as a convex combination of two other effects.
That is, If $E$ is an extreme point, then $E=\lambda E_1+(1-\lambda)E_2$ for some $0\leq \lambda\leq 1$ and $E_1,E_2\in \mathscr{E}(\mathscr{H})$ implies $E=E_1=E_2$.
#### Proposition: Effect operator lemma
The set of orthogonal projections on $\mathscr{H}$, $\mathscr{P}(\mathscr{H})$, is the set of extreme points of $\mathscr{E}(\mathscr{H})$.
#### Theorem: Generalized measures on effects
Let $\mathscr{H}$ be a finite-dimensional Hilbert space. Then any generalized probability measure
$$
\mu:E\in \mathscr{E}(\mathscr{H})\to \mu(E)\in[0,1]
$$
with the properties (same as the definition of states):
1. $0\leq \mu(E)\leq 1$ for all $E\in \mathscr{E}(\mathscr{H})$.
2. $\mu(I_{\mathscr{H}})=1$.
3. If $E_1,E_2,\cdots,E_n$ are pairwise disjoint orthogonal effects, whose sum is also in $\mathscr{E}(\mathscr{H})$ then $\mu(E_1\lor E_2\lor\cdots\lor E_n)=\sum_{i=1}^n\mu(E_i)$.
is the form:
$\mu(E)=\operatorname{Tr}(\rho E)$
for some density operator $\rho$ on $\mathscr{H}$.
[Proof ignored here]
> If $\mu$ is a positive linear functional on the space of self-adjoint operators on the finite dimensional Hilbert space $\mathscr{H}$.
>
> Then, there exists a density operator $\rho$ on $\mathscr{H}$ such that $\mu(E)=\operatorname{Tr}(\rho E)$.
### Measurements
A measurement (observation) of a system prepared in a given state produces an outcome $x$, $x$ is a physical event that is a subset of the set of all possible outcomes.
To each $x\in X$, we associate a measurement operator $M_x$ on $\mathscr{H}$.
Given the initial state (pure state, unit vector) $u$, the probability of measurement outcome $x$ is given by:
$$
p(x)=\|M_xu\|^2
$$
After the measurement, the state of the system is given by:
$$
v=\frac{M_xu}{\|M_xu\|}
$$
Note that to make sense of this definition, the collection of measurement operators $\{M_x\}$ must satisfy the **completeness** requirement:
$$
1=\sum_{x\in X} p(x)=\sum_{x\in X}\|M_xu\|^2=\sum_{x\in X}\langle M_xu,M_xu\rangle=\langle u,(\sum_{x\in X}M_x^*M_x)u\rangle
$$
So $\sum_{x\in X}M_x^*M_x=I$.
An example of measurement is the projective measurements (von Neumann measurements).
It is given by the set of orthogonal projections $M_x$ on $\mathscr{H}$ with the property:
1. $M_x=M_x^*$
2. $M_xM_y=\delta_{xy}M_x$ for all $x,y\in X$
3. $\sum_{x\in X}M_x=I$
#### Composition of measurements
Given two complete collections of measurement operators $\{M_x\}$ and $\{N_y\}$ on $\mathscr{H}_1$ and $\mathscr{H}_2$ respectively, the composition of the two measurements is given by the collection of measurement operators $\{M_xN_y\}$ on $\mathscr{H}_1\otimes\mathscr{H}_2$.
#### Proposition of indistinguishability
Suppose that we have two system $u_1,u_2\in \mathscr{H}_1$, the two states are distinguishable if and only if they are orthogonal.
Ways to distinguish the two states:
1. set $X=\{0,1,2\}$ and $M_i=|u_i\rangle\langle u_i|$, $M_0=I-M_1-M_2$
2. then $\{M_0,M_1,M_2\}$ is a complete collection of measurement operators on $\mathscr{H}$.
3. suppose the prepared state is $u_1$, then $p(1)=\|M_1u_1\|^2=\|u_1\|^2=1$, $p(2)=\|M_2u_1\|^2=0$, $p(0)=\|M_0u_1\|^2=0$.
If they are not orthogonal, then there are no choice of measurement operators to distinguish the two states.
[Proof ignored here]
_intuitively, if the two states are not orthogonal, then for any measurement there exists non-zero probability of getting the same outcome for both states._
#### Effects and POVM measurements
An effect on the finite dimension Hilbert space $\mathscr{H}$ is a positive operator $E$ on $\mathscr{H}$ such that $0\leq E\leq I$. A positive operator valued measure POVM consists of an index set $\mathscr{I}$ and a collection of effects $\{E_i,i\in \mathscr{I}\}$ satisfying the identity $\sum_{i\in \mathscr{I}}E_i=I$.
The probabilty of measurement outcome $i\in \mathscr{I}$ is given by $p(i)=\langle v,E_iv\rangle$ on a ysstem prepared in the state described by the unit vector $v$.
For a mixed state $\rho$, the probability of measurement outcome $i\in \mathscr{I}$ is given by $p(i)=\operatorname{Tr}(\rho E_i)$.
Example, suppose we have a system prepared in the following two states:
$$
u_1=|0\rangle, u_2=\frac{1}{\sqrt{2}}(|0\rangle+|1\rangle)
$$
Since they are not orthogonal, there is no measurement that can definitely distinguish the two states.
Consider the following POVM:
$$
E_1=\frac{\sqrt{2}}{1+\sqrt{2}}|1\rangle \langle 1|, E_2=\frac{\sqrt{2}}{1+\sqrt{2}}\frac{(|0\rangle-|1\rangle)(\langle 0|-\langle 1|)}{2},E_3=I-E_1-E_2
$$
Then, suppose we have an unknown state $u$, the probability of given $u_1$, measurement outcome $1$ is:
$$
p(1)=\langle u_1,E_1u_1\rangle=0
$$
So if the measurement outcome is $1$, we can conclude that the state is $u_2$.
The probability of given $u_2$, measurement outcome $2$ is:
$$
p(2)=\langle u_2,E_2u_2\rangle=0
$$
So if the measurement outcome is $2$, we can conclude that the state is $u_1$.
If the measurement outcome is $3$, then we cannot conclude anything about the state.
#### Proposition: Ancilla system
A general measurement of a system having Hilbert space $\mathscr{H}$ is equivalent to a projective measurement composed with a unitary transformation on the Hilbert space $\mathscr{H}\otimes\mathscr{A}$ of a composite system. The system described by $\mathscr{A}$ is called the ancilla system. This equivalent measurement is not unique.
[Further details ignored here]
### Quantum operations and CPTP maps
$L^1(\Omega,\mathscr{F},\mu)$ is the space of intergrable functions on $\mathscr{H}$, that is $\int_{\Omega} |f(\omega)| d\mu(\omega)<\infty$ for some measure $\mu$ on $\Omega$.
We define $\mathscr{L}_1(\mathscr{H})$, the space of trace class operators on $\mathscr{H}$, as the space of operators $A$ such that $\operatorname{Tr}(\sqrt{A^*A})<\infty$.
$L_2(\Omega,\mathscr{F},\mu)$ is the space of square intergrable functions on $\mathscr{H}$, that is $\int_{\Omega} |f(\omega)|^2 d\mu(\omega)<\infty$ for some measure $\mu$ on $\Omega$.
We define $\mathscr{L}_2(\mathscr{H})$, the space of Hilbert-Schmidt operators on $\mathscr{H}$, as the space of operators $A$ such that $\operatorname{Tr}(A^*A)<\infty$.
The space of $\mathscr{L}_2(\mathscr{H})$ is a Hilbert space equipped with the inner product $\langle A,B\rangle=\operatorname{Tr}(B^*A)$.
with Cauchy-Schwarz inequality:
$$
\operatorname{Tr}(A^*B)\leq \operatorname{Tr}(A^*A)^{1/2}\operatorname{Tr}(B^*B)^{1/2}
$$
The space of density operators $\mathscr{S}(\mathscr{H})$ is a convex subset (for $\rho_1,\rho_2\in \mathscr{S}(\mathscr{H})$, $\lambda\in[0,1]$, $\lambda\rho_1+(1-\lambda)\rho_2\in \mathscr{S}(\mathscr{H})$) of $\mathscr{L}_1(\mathscr{H})$ with trace $1$.
#### Definition of CPTP map
A completely positive trace preserving (CPTP) map is a linear map $\mathscr{E}:\mathscr{L}_1(\mathscr{H})\to \mathscr{L}_1(\mathscr{H})$ such that:
1. $\mathscr{E}(\operatorname{Tr}(\rho))=\operatorname{Tr}(\rho)$ for all $\rho\in \mathscr{S}(\mathscr{H})$.
2. $\mathscr{E}$ is completely positive, that is $\mathscr{E}\otimes I_{\mathscr{H}}:\mathscr{L}_1(\mathscr{H}_1\otimes\mathscr{K})\to\mathscr{L}_1(\mathscr{H}_2\otimes\mathscr{K})$ is positive for every finite-dimensional or separable Hilbert space $\mathscr{K}$.
_note that the condition for completely positive is stronger than the condition for positive. Because if we only require the map to be positive, then the map may assign negative values to some entangled states._
Example:
A map $\mathscr{E}:\mathscr{L}_1(\mathscr{H})\to \mathscr{L}_1(\mathscr{H})$ is given by:
$$
\mathscr{E}(\rho):\sum_{i,j} \alpha_{ij}|i\rangle\langle j|\to \sum_{i,j} \overline{\alpha_{ij}}|i\rangle\langle j|
$$
This map is positive but will assign negative values to some entangled states given by:
$$
\rho=|\phi\rangle\langle\phi|
$$
where $|\phi\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)$.
#### Definition of quantum channel
Let $\mathscr{H}$ and $\mathscr{K}$ be Hilbert spaces, $U$ be a unitary operator on $\mathscr{H}\otimes\mathscr{K}$, and $\omega$ be a density operator on $\mathscr{K}$. The CPTP map
$$
\mathscr{E}:T\in \mathscr{L}_1(\mathscr{H})\to \operatorname{Tr}_\mathscr{K}(U (T\otimes \omega)U^*)
$$
is a quantum channel.
We skipped few exercises here and jump right into the definition.
In short, the quantum channel describes the following process:
Initialization: The ancilla $\mathscr{K}$ is prepared in a fixed state $\omega$ (density operator).
Coupling: The input state $T$ (on $\mathscr{H}$) is combined with $\omega$ to form $T\otimes\omega$ on $\mathscr{H}\otimes\mathscr{K}$.
Unitary evolution: The joint system evolves under $U$ (unitary on $\mathscr{H}\otimes\mathscr{K}$).
Discarding ancilla: The ancilla $\mathscr{K}$ is traced out, leaving a state on $\mathscr{H}$.
This is a Stinespring dilation, representing any CPTP map.
#### Proposition: Stinespring dilation theorem (to be checked)
Any CPTP map $\mathscr{E}:\mathscr{L}_1(\mathscr{H})\to \mathscr{L}_1(\mathscr{H})$ can be represented as:
$$
\mathscr{E}(T)=\operatorname{Tr}_\mathscr{K}(U (T\otimes \omega)U^*)
$$
### Conditional operations
#### Definition of controlled-unitary operations
A controlled-unitary operation is
$$
U\coloneqq\sum_{a=1}^{n_1}|a\rangle\langle a|\otimes U_a
$$
where $U_a$ is a unitary operator on $\mathscr{H}$ and $|a\rangle$ is a basis of $\mathscr{K}$.
#### Principle of deferred measurement
All measurements that may occur in the process of executing a quantum computation may be relegated to the end of the quantum circuit, prior to which all operations are unitary.
## Section 2: Quantum entanglement
### Bell states and the EPR phenomenon
#### Definition of Bell states
The Bell states are the following four states:
$$
|\Phi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle), |\Phi^-\rangle=\frac{1}{\sqrt{2}}(|00\rangle-|11\rangle)
$$
$$
|\Psi^+\rangle=\frac{1}{\sqrt{2}}(|01\rangle+|10\rangle), |\Psi^-\rangle=\frac{1}{\sqrt{2}}(|01\rangle-|10\rangle)
$$
These are the basis of the two-qubit Hilbert space.
[The section discussing the EPR phenomenon is ignored here, the key to remember is that there exists no classical (local) explanation for the correlation between the two qubits.]
### Von Neumann entropy and maximally entangled states
#### Definition of EPR state
A vector $|\psi\rangle$ on tensor product space $\mathscr{H}_1\otimes\mathscr{H}_2$ is called an EPR state if it is of the form:
$$
|\psi\rangle=\frac{1}{\sqrt{n}}\sum_{i=1}^n |i\rangle_1|i\rangle_2
$$
where $|i\rangle_1$ and $|i\rangle_2$ are basis of $\mathscr{H}_1$ and $\mathscr{H}_2$ respectively.
This describes a maximally entangled state.
#### Weyl operators
Let $\mathscr{H}$ be a Hilbert space with orthonormal basis $(|i\rangle)$.
The shift operator $X$ is defined as:
$$
X|i\rangle=|i+1\rangle
$$
Note that $X$ permutes basis element cyclically. Let $\omega=e^{2\pi i/n}$, then $1,\omega,\omega^2,\cdots,\omega^{n-1}$ are the $n$-th roots of unity.
The phase operator $Z$ is defined as:
$$
Z|i\rangle=\omega^i|i\rangle
$$
The Weyl operators are the following operators:
$$
W_{ab}=X^aZ^b
$$
where $a,b\in\{0,1,\cdots,n-1\}$.
#### Definition of von Neumann entropy
The von Neumann entropy of a density operator $\rho$ is defined as:
$$
S(\rho)=-\operatorname{Tr}(\rho\log\rho)=-\sum_{i}\mu_i\log\mu_i
$$
where $\mu_i$ are the eigenvalues of $\rho$.
## Section 3: Information transmission by quantum systems
### Transmission of classical information
#### Transmission over information channels
Let the measurement operation defined by POVM $\{E_y\}$, the conditional probability of obtaining signal $y$ at the output given the input is $x$ is given by:
$$
p_E(y|x)=\operatorname{Tr}(\rho_x E_y)
$$
where $\rho_x$ is the density operator of the input state, $E_y$ is the measurement operator for the output signal $y$.
#### Holevo bound
The maximal amount of classical information that can be transmitted by a quantum system is given by the Holevo bound. $\log_2(d)$ is the maximum amount of classical information that can be transmitted by a quantum system with $d$ levels.
> The fact that Hilbert space contains infinitely many different state vectors does not aid us in transmitting an unlimited amount of information. The more states are used for transmission, the closer they are to each other and hence they become less and less distinguishable.
### Making use of entanglement and local operations
No information can be gained by measuring a pair of entangled qubits.
### Superdense coding [very important]
It is a procedure defined as follows:
Suppose $A$ and $B$ share a Bell state $|\Phi^+\rangle=\frac{1}{\sqrt{2}}(|00\rangle+|11\rangle)$, where $A$ holds the first part and $B$ holds the second part.
$A$ wish to send 2 classical bits to $B$.
$A$ performs one of four Pauli unitaries on the combined state of entangled qubits $\otimes$ one qubit. Then $A$ sends the resulting one qubit to $B$.
This operation extends the initial one entangled qubit to a system of one of four orthogonal Bell states.
$B$ performs a measurement on the combined state of the one qubit and the entangled qubits he holds.
$B$ decodes the result and obtains the 2 classical bits sent by $A$.
Superdense coding](https://notenextra.trance-0.com/Math401/Superdense_coding.png)
## Section 4: Quantum automorphisms and dynamics
Section ignored.

View File

@@ -1 +0,0 @@
# Math401 Topic 7: Basic of quantum circuits

View File

@@ -1,16 +0,0 @@
export default {
Math401_T1: "Math 401, Topic 1: Probability under language of measure theory",
Math401_T2: "Math 401, Topic 2: Finite-dimensional Hilbert spaces",
Math401_T3: "Math 401, Topic 3: Separable Hilbert spaces",
Math401_T4: "Math 401, Topic 4: The quantum version of probabilistic concepts",
Math401_T5: "Math 401, Topic 5: Introducing dynamics: classical and non-commutative",
Math401_T6: "Math 401, Topic 6: Postulates of quantum theory and measurement operations",
Math401_T7: "Math 401, Topic 7: Basic of quantum circuits",
"---":{
type: 'separator'
},
Math401_P1: "Math 401, Paper 1: Concentration of measure effects in quantum information (Patrick Hayden)",
Math401_P1_1: "Math 401, Paper 1, Side note 1: Quantum information theory and Measure concentration",
Math401_P1_2: "Math 401, Paper 1, Side note 2: Page's lemma",
Math401_P1_3: "Math 401, Paper 1, Side note 3: Levy's concentration theorem",
}

View File

@@ -1,64 +0,0 @@
# Node 1
_all the materials are recovered after the end of the course. I cannot split my mind away from those materials._
## Recap on familiar ideas
### Group
A group is a set $G$ with a binary operation $\cdot$ that satisfies the following properties:
1. **Closure**: For all $a, b \in G$, the result of the operation $a \cdot b$ is also in $G$.
2. **Associativity**: For all $a, b, c \in G$, $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.
3. **Identity**: There exists an element $e \in G$ such that for all $a \in G$, $e \cdot a = a \cdot e = a$.
4. **Inverses**: For each $a \in G$, there exists an element $b \in G$ such that $a \cdot b = b \cdot a = e$.
### Ring
A ring is a set $R$ with two binary operations, addition and multiplication, that satisfies the following properties:
1. **Additive Closure**: For all $a, b \in R$, the result of the addition $a + b$ is also in $R$.
2. **Additive Associativity**: For all $a, b, c \in R$, $(a + b) + c = a + (b + c)$.
3. **Additive Identity**: There exists an element $0 \in R$ such that for all $a \in R$, $0 + a = a + 0 = a$.
4. **Additive Inverses**: For each $a \in R$, there exists an element $-a \in R$ such that $a + (-a) = (-a) + a = 0$.
5. **Multiplicative Closure**: For all $a, b \in R$, the result of the multiplication $a \cdot b$ is also in $R$.
6. **Multiplicative Associativity**: For all $a, b, c \in R$, $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.
Others not shown since you don't need too much.
## Symmetric Group
### Definition
The symmetric group $S_n$ is the group of all permutations of $n$ elements. Or in other words, the group of all bijections from a set of $n$ elements to itself.
Example:
$$
e=1,2,3\\
(12)=2,1,3\\
(13)=3,2,1\\
(23)=1,3,2\\
(123)=3,1,2\\
(132)=2,3,1
$$
$(12)$ means that $1\to 2, 2\to 1, 3\to 3$ we follows the cyclic order for the elements in the set.
$S_3 = \{e, (12), (13), (23), (123), (132)\}$
The multiplication table of $S_3$ is:
|Element|e|(12)|(13)|(23)|(123)|(132)|
|---|---|---|---|---|---|---|
|e|e|(12)|(13)|(23)|(123)|(132)|
|(12)|(12)|e|(123)|(13)|(23)|(132)|
|(13)|(13)|(132)|e|(12)|(23)|(123)|
|(23)|(23)|(123)|(132)|e|(12)|(13)|
|(123)|(123)|(13)|(23)|(132)|e|(12)|
|(132)|(132)|(23)|(12)|(123)|(13)|e|
## Functions defined on $S_n$
### Symmetric Generating Set

View File

@@ -1,9 +0,0 @@
# Node 2
## Random matrix theory
### Wigner's semicircle law
## h-Inversion Polynomials for a Special Heisenberg Family
###

View File

@@ -1,399 +0,0 @@
# Coding and Information Theory Crash Course
## Encoding
Let $A,B$ be two finite sets with size $a,b$ respectively.
Let $S(A)=\bigcup_{r=1}^{\infty}A^r$ be the word semigroup generated by $A$.
A one-to-one mapping $f:A\to S(B)$ is called a code with message alphabet $A$ and encoded alphabet $B$.
Example:
- $A=$ RGB color space
- $B=\{0\sim 255\}$
- $f:A\to B^n$ is a code
For example, $f(white)=(255,255,255)$, $f(green)=(0,255,0)$
### Uniquely decipherable codes
A code $f:A\to S(B)$ is called uniquely decipherable if the extension code
$$
\tilde{f}:S(A)\to S(B)=f(a_1)f(a_2)\cdots f(a_n)
$$
is one-to-one.
Example:
- $A=\{a,b,c,d\}$
- $B=\{0,1\}$
- $f(a)=00$, $f(b)=01$, $f(c)=10$, $f(d)=11$
is uniquely decipherable.
- $f(a)=0$, $f(b)=1$, $f(c)=10$, $f(d)=11$
is not uniquely decipherable.
Since $\tilde{f}(ba)=10=\tilde{f}(c)$
#### Irreducible codes
A code $f:A\to S(B)$ is called irreducible if for any $x,y\in A$, $f(y)\neq f(x)w$ for some $w\in S(B)$.
This condition ensures that every message used in the encoding process is uniquely decipherable.
Means that there is no ambiguity in the decoding process.
#### Theorem 1.1.1 Sardinas and Patterson Theorem
Let $A,B$ be alphabet sets of size $a,b$ respectively.
Let $f:A\to S(B)$ be a code that is uniquely decipherable.
Then
$$
\sum_{x\in A}b^{-l(f(x))}\leq 1
$$
where $l(f(x))$ is the length of the codeword $f(x)$. (Number of elements used in the codeword for $x$)
Proof:
Let $L$ denote the max length of the codeword for any $x\in A$.
$L=\max\{l(f(x))|x\in A\}$
Let $c_r$ be the number of codewords of length $r$.
Then
$$
\begin{aligned}
\sum_{x\in A}b^{-l(f(x))}&=\sum_{r=1}^{L}\sum_{l(f(x))=r}b^{-l(f(x))}\\
&=\sum_{r=1}^{L}c_rb^{-r}
\end{aligned}
$$
Note that $r$ is the length of the codeword.
The max number of elements that can be represented by $r$ elements in $B$ is $b^r$, so $c_r\leq b^r$.
This gives that the power series
$$
\sum_{r=1}^{\infty}b^{-r}c_r
$$
converges to $R=\frac{1}{\limsup_{r\to\infty}\sqrt[r]{c_r}}\leq 1$.
So the sum of the probabilities of all the codewords must be less than or equal to 1.
#### Sardinas and Patterson Algorithm
Let $A=\{a_1,a_2,\cdots,a_n\}$ be the message alphabet and $B=\{b_1,b_2,\cdots,b_m\}$ be the encoded alphabet.
We test whether the code $f:A\to S(B)$ is uniquely decipherable.
```python
def is_uniquely_decipherable(f):
contain_common_prefix =
message_space=set()
for x in A:
if f(x) in message_space:
return False
message_space.add(f(x))
for i in range(1,len(f(x))):
if f(x)[:i] in message_space:
contain_common_prefix = True
while contain_common_prefix:
contain_common_prefix = False
for x in message_space:
for y in message_space:
code_length = min(len(x),len(y))
if x[:code_length] == y[:code_length]:
contain_common_prefix = True
if len(x) < len(y):
message_space.add(y[code_length:])
else:
message_space.add(x[code_length:])
break
return True
```
### Shannon's source coding theorem
#### Definition 1.1.4
An elementary information source is a pair $(A,\mu)$ where $A$ is an alphabet and $\mu$ is a probability distribution on $A$. $\mu$ is a function $\mu:A\to[0,1]$ such that $\sum_{a\in A}\mu(a)=1$.
The **mean code word length** of an information source $(A,\mu)$ given a code $f:A\to S(B)$ is defined as
$$
\overline{l}(\mu,f)=\sum_{a\in A}\mu(a)l(f(a))
$$
The $L(\mu)$ of the mean code word length is defined as
$$
L(\mu)=\min\{\overline{l}(\mu,f)|f:A\to S(B)\text{ is uniquely decipherable}\}
$$
#### Lemma: Jenson's inequality
Let $f$ be a convex function on the interval $(a,b)$. Then for any $x_1,x_2,\cdots,x_n\in (a,b)$ and $\lambda_1,\lambda_2,\cdots,\lambda_n\in [0,1]$ such that $\sum_{i=1}^{n}\lambda_i=1$, we have
$$f(\sum_{i=1}^{n}\lambda_ix_i)\leq \sum_{i=1}^{n}\lambda_if(x_i)$$
Proof:
If $f$ is a convex function, there are three properties that useful for the proof:
1. $f''(x)\geq 0$ for all $x\in (a,b)$
2. For any $x,y\in (a,b)$, $f(x)\geq f(y)+(x-y)f'(y)$ (Take tangent line at $y$)
3. For any $x,y\in (a,b)$ and $0<\lambda<1$, we have $f(\lambda x+(1-\lambda)y)\leq \lambda f(x)+(1-\lambda)f(y)$ (Take line connecting $f(x)$ and $f(y)$)
We use $f(x)\geq f(y)+(x-y)f'(y)$, we replace $y=\sum_{i=1}^{n}\lambda_ix_i$ and $x=x_j$, we have
$$f(x_j)\geq f(\sum_{i=1}^{n}\lambda_ix_i)+(x_j-\sum_{i=1}^{n}\lambda_ix_i)f'(\sum_{i=1}^{n}\lambda_ix_i)$$
We sum all the inequalities, we have
$$
\begin{aligned}
\sum_{j=1}^{n}\lambda_j f(x_j)&\geq \sum_{j=1}^{n}\lambda_jf(\sum_{i=1}^{n}\lambda_ix_i)+\sum_{j=1}^{n}\lambda_j(x_j-\sum_{i=1}^{n}\lambda_ix_i)f'(\sum_{i=1}^{n}\lambda_ix_i)\\
&\geq \sum_{j=1}^{n}\lambda_jf(\sum_{i=1}^{n}\lambda_ix_i)+0\\
&=f(\sum_{j=1}^{n}\lambda_ix_j)
\end{aligned}
$$
#### Theorem 1.1.5
Shannon's source coding theorem
Let $(A,\mu)$ be an elementary information source and let $b$ be the size of the encoded alphabet $B$.
Then
$$
\frac{-\sum_{x\in A}\mu(x)\log\mu(x)}{\log b}\leq L(\mu)<\frac{-\sum_{x\in A}\mu(x)\log\mu(x)}{\log b}+1
$$
where $L(\mu)$ is the minimum mean code word length of all uniquely decipherable codes for $(A,\mu)$.
Proof:
First, we show that
$$
\sum_{a\in A}m(a)\mu(a)\geq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)
$$
Let $m:A\to \mathbb{Z}_+$ be a map satisfying
$$
\sum_{a\in A}b^{-m(a)}\leq 1
$$
We defined the probability distribution $v$ on $A$ as
$$
v(x)=\frac{b^{-m(x)}}{\sum_{a\in A}b^{-m(a)}}=\frac{b^{-m(x)}}{T}
$$
Write $T=\sum_{a\in A}b^{-m(a)}$ so that $T\leq 1$.
Since $b^{-m(a)}=T\cdot v(a)$,
$-m(a)\log b=\log T+\log v(a)$, $m(a)=-\frac{\log T}{\log b}-\frac{\log v(a)}{\log b}$
We have
$$
\begin{aligned}
\sum_{a\in A}m(a)\mu(a)&=\sum_{a\in A}\left(-\frac{\log T}{\log b}-\frac{\log v(a)}{\log b}\right)\mu(a)\\
&=-\frac{\log T}{\log b}\sum_{a\in A}\mu(a)-\frac{1}{\log b}\sum_{a\in A}\mu(a)\log v(a)\\
&=(\log b)^{-1}\left\{-\log T - \sum_{a\in A}\mu(a)\log v(a)\right\}
\end{aligned}
$$
Without loss of generality, we assume that $\mu(a)>0$ for all $a\in A$.
$$
\begin{aligned}
-\sum_{a\in A}\mu(a)\log v(a)&=-\sum_{a\in A}\mu(a)(\log \mu(a)+\log v(a)-\log \mu(a))\\
&=-\sum_{a\in A}\mu(a)\log \mu(a)-\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}\\
&=-\sum_{a\in A}\mu(a)\log \mu(a)-\log \prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}
\end{aligned}
$$
$\log \prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}=\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}$ is also called KullbackLeibler divergence or relative entropy.
Since $\log$ is a concave function, by Jensen's inequality $f(\sum_{i=1}^{n}\lambda_ix_i)\leq \sum_{i=1}^{n}\lambda_if(x_i)$, we have
$$
\begin{aligned}
\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}&\leq\log\left(\sum_{a\in A}\mu(a) \frac{v(a)}{\mu(a)}\right)\\
\log\left(\prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}\right)&\leq \log\left(\sum_{a\in A}\mu(a) \frac{v(a)}{\mu(a)}\right)\\
\prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}&\leq \sum_{a\in A}\mu(a)\frac{v(a)}{\mu(a)}=1
\end{aligned}
$$
So
$$
-\sum_{a\in A}\mu(a)\log v(a)\geq -\sum_{a\in A}\mu(a)\log \mu(a)
$$
(This is also known as Gibbs' inequality: Put in words, the information entropy of a distribution $P$ is less than or equal to its cross entropy with any other distribution $Q$.)
Since $T\leq 1$, $-\log T\geq 0$, we have
$$
\sum_{a\in A}m(a)\mu(a)\geq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)
$$
Second, we show that
$$
\sum_{a\in A}m(a)\mu(a)\leq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)+1
$$
By Theorem 1.1.1, there exists an irreducible code $f:A\to S(B)$ such that $l(f(a))=m(a)$ for all $a\in A$.
So
$$
\begin{aligned}
\overline{l}(\mu,f)&=\sum_{a\in A}\mu(a)m(a)\\
&<\sum_{a\in A}\mu(a)\left(1-\frac{\log\mu(a)}{\log b}\right)\\
&=-\sum_{a\in A}\mu(a)\log\mu(a)\cdot\frac{1}{\log b}+1\\
&=-(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)+1
\end{aligned}
$$
QED
### Entropy
Let $A$ be an alphabet and let $\mathcal{P}(A)$ be the set of all probability distributions on $A$. Any element $\mu\in \mathcal{P}(A)$ is thus a map from $A$ to $[0,1]$ such that $\sum_{a\in A}\mu(a)=1$.
Thus $\mathcal{P}(A)$ is a compact conves subset of real linear space $\mathbb{R}^A$.
We define the map $H:\mathcal{P}(A)\to\mathbb{R}$ as
$$
H(\mu)=-\sum_{a\in A}\mu(a)\log_2\mu(a)
$$
#### Basic properties of $H$
1. $H(\mu)\geq 0$
2. $H(\mu)=0$ if and only if $\mu$ is a point mass.
3. $H(\mu)=\log_2|A|$ if and only if $\mu$ is uniform.
### Huffman's coding algorithm
Huffman's coding algorithm is a greedy algorithm that constructs an optimal prefix code for a given probability distribution.
The algorithm is as follows:
1. Sort the symbols by their probabilities in descending order.
2. Merge the two symbols with the smallest probabilities and assign a new codeword to the merged symbol.
3. Repeat step 2 until there is only one symbol left.
```python
import heapq
def huffman(frequencies):
class Node:
def __init__(self, symbol=None, freq=0, left=None, right=None):
self.symbol = symbol
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other): # For priority queue
return self.freq < other.freq
def is_leaf(self):
return self.symbol is not None
# Build the Huffman tree
heap = [Node(sym, freq) for sym, freq in frequencies.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(freq=left.freq + right.freq, left=left, right=right)
heapq.heappush(heap, merged)
root = heap[0]
# Assign codes by traversing the tree
codebook = {}
def assign_codes(node, code=""):
if node.is_leaf():
codebook[node.symbol] = code
else:
assign_codes(node.left, code + "0")
assign_codes(node.right, code + "1")
assign_codes(root)
# Helper to pretty print the binary tree
def tree_repr(node, prefix=""):
if node.is_leaf():
return f"{prefix}{repr(node.symbol)} ({node.freq})\n"
else:
s = f"{prefix}• ({node.freq})\n"
s += tree_repr(node.left, prefix + " 0 ")
s += tree_repr(node.right, prefix + " 1 ")
return s
print("Huffman Codebook:")
for sym in sorted(codebook, key=lambda s: frequencies[s], reverse=True):
print(f"{repr(sym)}: {codebook[sym]}")
print("\nCoding Tree:")
print(tree_repr(root))
return codebook
```
#### Definition 1.2.1
Let $A=\{a_1,a_2,\cdots,a_n\}$ be an alphabet and let $\mu=(\mu(a_1),\mu(a_2),\cdots,\mu(a_n))$ be a probability distribution on $A$.
Proof:
We use mathematical induction on the number of symbols $n$.
Base Case: $n=2$
Only two symbols assign one "0", the other "1". This is clearly optimal (only possible choice).
Inductive Step:
Assume that Huffman's algorithm produces an optimal code for $n-1$ symbols.
Now, given $n$ symbols, Huffman merges the two least probable symbols $a,b$ into $c$, and constructs an optimal code recursively for $n-1$ symbols.
n1 symbols.
By the inductive hypothesis, the code on $A'$ is optimal.
is optimal.
By Step 2 above, assigning the two merged symbols $a$ and $b$ codewords $w_0$ and $w_1$ (based on
1
$w_1$ (based on $c$'s codeword $w$) results in the optimal solution for $A$.
Therefore, by induction, Huffmans algorithm gives an optimal prefix code for any $n$.
QED

View File

@@ -1,11 +0,0 @@
export default {
index: "Course Description",
"---":{
type: 'separator'
},
Math401_N1: "Math 401, Notes 1",
Math401_N2: "Math 401, Notes 2",
Math401_N3: "Math 401, Notes 3",
Freiwald_summer: "Math 401, Summer 2025: Freiwald research project notes",
Extending_thesis: "Math 401, Fall 2025: Thesis notes",
}

View File

@@ -1,69 +0,0 @@
# Math 401
This is a course about symmetrical group and bunch of applications in other fields of math.
Prof. Renado Fere is teaching this course.
The course is split into two parts:
1. Symmetrical group
2. Summer research project
## Symmetrical group (Spring 2025 Course)
Notes from N1-N3.
Basically are overview of some interesting topics related to symmetrical group or other related math topics.
I don't record them carefully, but I will try to update them if they are necessary for my future reference.
## Summer research project
### Schedule
Presentation starts next week
Start with examples, do exploratory work, it's just a summer.
Final work: Find certain topic you are interested in, and do a expository paper.
Find the motivation, background, definition, theorem, example, application for the theory you are interested in.
At least 3 presentations is required.
Collect the papers you interested in as you go the research, it is not linear.
Symposium on November.
Lightning talk (3 minutes) on end of July.
### Topic of interest
I am interested in the following topics:
1. Quantum error correction
2. Von Neumann algebra and other operator algebras which are related to quantum algorithms
### Notes
T1-T7 should be the notes for the spring course Math 444. Taught by Prof. Renado Fere on Spring 2025 but I don't know that and they are helpful for understanding the material for the book that might contains my subject of interest.
[The Functional Analysis of Quantum Information Theory](https://arxiv.org/abs/1410.7188)
The original lecture notes, by Prof. Renado Fere, are here, may move to other places. Last updated on 2025-06-14.
[Math 444 Spring 2025 Notes, Lecture 1](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes01.pdf)
[Math 444 Spring 2025 Notes, Lecture 2](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes02.pdf)
[Math 444 Spring 2025 Notes, Lecture 3](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes03.pdf)
[Math 444 Spring 2025 Notes, Lecture 4](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes04.pdf)
[Math 444 Spring 2025 Notes, Lecture 5](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes05.pdf)
[Math 444 Spring 2025 Notes, Lecture 6](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes06.pdf)
[Math 444 Spring 2025 Notes, Lecture 7](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes07.pdf)
[Math 444 Spring 2025 Notes, Lecture 9](https://www.math.wustl.edu/~feres/Math444Spring25/Math444Spring25Notes09.pdf)