This commit is contained in:
Trance-0
2025-06-06 23:02:06 -05:00
parent 135575f0bf
commit a67c60f03e
6 changed files with 22 additions and 0 deletions

View File

@@ -0,0 +1,20 @@
# Beginning of the Summer Program
## Schedule
Presentation starts next week
Start with examples, do exploratory work, it's just a summer.
Final work: Find certain topic you are interested in, and do a expository paper.
Find the motivation, background, definition, theorem, example, application for the theory you are interested in.
At least 3 presentations is required.
Collect the papers you interested in as you go the research, it is not linear.
Symposium on November.
Lightning talk (3 minutes) on end of July.

View File

@@ -0,0 +1,2 @@
# Schedule Presentation

View File

@@ -0,0 +1,64 @@
# Node 1
_all the materials are recovered after the end of the course. I cannot split my mind away from those materials._
## Recap on familiar ideas
### Group
A group is a set $G$ with a binary operation $\cdot$ that satisfies the following properties:
1. **Closure**: For all $a, b \in G$, the result of the operation $a \cdot b$ is also in $G$.
2. **Associativity**: For all $a, b, c \in G$, $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.
3. **Identity**: There exists an element $e \in G$ such that for all $a \in G$, $e \cdot a = a \cdot e = a$.
4. **Inverses**: For each $a \in G$, there exists an element $b \in G$ such that $a \cdot b = b \cdot a = e$.
### Ring
A ring is a set $R$ with two binary operations, addition and multiplication, that satisfies the following properties:
1. **Additive Closure**: For all $a, b \in R$, the result of the addition $a + b$ is also in $R$.
2. **Additive Associativity**: For all $a, b, c \in R$, $(a + b) + c = a + (b + c)$.
3. **Additive Identity**: There exists an element $0 \in R$ such that for all $a \in R$, $0 + a = a + 0 = a$.
4. **Additive Inverses**: For each $a \in R$, there exists an element $-a \in R$ such that $a + (-a) = (-a) + a = 0$.
5. **Multiplicative Closure**: For all $a, b \in R$, the result of the multiplication $a \cdot b$ is also in $R$.
6. **Multiplicative Associativity**: For all $a, b, c \in R$, $(a \cdot b) \cdot c = a \cdot (b \cdot c)$.
Others not shown since you don't need too much.
## Symmetric Group
### Definition
The symmetric group $S_n$ is the group of all permutations of $n$ elements. Or in other words, the group of all bijections from a set of $n$ elements to itself.
Example:
$$
e=1,2,3\\
(12)=2,1,3\\
(13)=3,2,1\\
(23)=1,3,2\\
(123)=3,1,2\\
(132)=2,3,1
$$
$(12)$ means that $1\to 2, 2\to 1, 3\to 3$ we follows the cyclic order for the elements in the set.
$S_3 = \{e, (12), (13), (23), (123), (132)\}$
The multiplication table of $S_3$ is:
|Element|e|(12)|(13)|(23)|(123)|(132)|
|---|---|---|---|---|---|---|
|e|e|(12)|(13)|(23)|(123)|(132)|
|(12)|(12)|e|(123)|(13)|(23)|(132)|
|(13)|(13)|(132)|e|(12)|(23)|(123)|
|(23)|(23)|(123)|(132)|e|(12)|(13)|
|(123)|(123)|(13)|(23)|(132)|e|(12)|
|(132)|(132)|(23)|(12)|(123)|(13)|e|
## Functions defined on $S_n$
### Symmetric Generating Set

View File

@@ -0,0 +1,9 @@
# Node 2
## Random matrix theory
### Wigner's semicircle law
## h-Inversion Polynomials for a Special Heisenberg Family
###

399
pages/Math401/Math401_N3.md Normal file
View File

@@ -0,0 +1,399 @@
# Coding and Information Theory Crash Course
## Encoding
Let $A,B$ be two finite sets with size $a,b$ respectively.
Let $S(A)=\bigcup_{r=1}^{\infty}A^r$ be the word semigroup generated by $A$.
A one-to-one mapping $f:A\to S(B)$ is called a code with message alphabet $A$ and encoded alphabet $B$.
Example:
- $A=$ RGB color space
- $B=\{0\sim 255\}$
- $f:A\to B^n$ is a code
For example, $f(white)=(255,255,255)$, $f(green)=(0,255,0)$
### Uniquely decipherable codes
A code $f:A\to S(B)$ is called uniquely decipherable if the extension code
$$
\tilde{f}:S(A)\to S(B)=f(a_1)f(a_2)\cdots f(a_n)
$$
is one-to-one.
Example:
- $A=\{a,b,c,d\}$
- $B=\{0,1\}$
- $f(a)=00$, $f(b)=01$, $f(c)=10$, $f(d)=11$
is uniquely decipherable.
- $f(a)=0$, $f(b)=1$, $f(c)=10$, $f(d)=11$
is not uniquely decipherable.
Since $\tilde{f}(ba)=10=\tilde{f}(c)$
#### Irreducible codes
A code $f:A\to S(B)$ is called irreducible if for any $x,y\in A$, $f(y)\neq f(x)w$ for some $w\in S(B)$.
This condition ensures that every message used in the encoding process is uniquely decipherable.
Means that there is no ambiguity in the decoding process.
#### Theorem 1.1.1 Sardinas and Patterson Theorem
Let $A,B$ be alphabet sets of size $a,b$ respectively.
Let $f:A\to S(B)$ be a code that is uniquely decipherable.
Then
$$
\sum_{x\in A}b^{-l(f(x))}\leq 1
$$
where $l(f(x))$ is the length of the codeword $f(x)$. (Number of elements used in the codeword for $x$)
Proof:
Let $L$ denote the max length of the codeword for any $x\in A$.
$L=\max\{l(f(x))|x\in A\}$
Let $c_r$ be the number of codewords of length $r$.
Then
$$
\begin{aligned}
\sum_{x\in A}b^{-l(f(x))}&=\sum_{r=1}^{L}\sum_{l(f(x))=r}b^{-l(f(x))}\\
&=\sum_{r=1}^{L}c_rb^{-r}
\end{aligned}
$$
Note that $r$ is the length of the codeword.
The max number of elements that can be represented by $r$ elements in $B$ is $b^r$, so $c_r\leq b^r$.
This gives that the power series
$$
\sum_{r=1}^{\infty}b^{-r}c_r
$$
converges to $R=\frac{1}{\limsup_{r\to\infty}\sqrt[r]{c_r}}\leq 1$.
So the sum of the probabilities of all the codewords must be less than or equal to 1.
#### Sardinas and Patterson Algorithm
Let $A=\{a_1,a_2,\cdots,a_n\}$ be the message alphabet and $B=\{b_1,b_2,\cdots,b_m\}$ be the encoded alphabet.
We test whether the code $f:A\to S(B)$ is uniquely decipherable.
```python
def is_uniquely_decipherable(f):
contain_common_prefix =
message_space=set()
for x in A:
if f(x) in message_space:
return False
message_space.add(f(x))
for i in range(1,len(f(x))):
if f(x)[:i] in message_space:
contain_common_prefix = True
while contain_common_prefix:
contain_common_prefix = False
for x in message_space:
for y in message_space:
code_length = min(len(x),len(y))
if x[:code_length] == y[:code_length]:
contain_common_prefix = True
if len(x) < len(y):
message_space.add(y[code_length:])
else:
message_space.add(x[code_length:])
break
return True
```
### Shannon's source coding theorem
#### Definition 1.1.4
An elementary information source is a pair $(A,\mu)$ where $A$ is an alphabet and $\mu$ is a probability distribution on $A$. $\mu$ is a function $\mu:A\to[0,1]$ such that $\sum_{a\in A}\mu(a)=1$.
The **mean code word length** of an information source $(A,\mu)$ given a code $f:A\to S(B)$ is defined as
$$
\overline{l}(\mu,f)=\sum_{a\in A}\mu(a)l(f(a))
$$
The $L(\mu)$ of the mean code word length is defined as
$$
L(\mu)=\min\{\overline{l}(\mu,f)|f:A\to S(B)\text{ is uniquely decipherable}\}
$$
#### Lemma: Jenson's inequality
Let $f$ be a convex function on the interval $(a,b)$. Then for any $x_1,x_2,\cdots,x_n\in (a,b)$ and $\lambda_1,\lambda_2,\cdots,\lambda_n\in [0,1]$ such that $\sum_{i=1}^{n}\lambda_i=1$, we have
$$f(\sum_{i=1}^{n}\lambda_ix_i)\leq \sum_{i=1}^{n}\lambda_if(x_i)$$
Proof:
If $f$ is a convex function, there are three properties that useful for the proof:
1. $f''(x)\geq 0$ for all $x\in (a,b)$
2. For any $x,y\in (a,b)$, $f(x)\geq f(y)+(x-y)f'(y)$ (Take tangent line at $y$)
3. For any $x,y\in (a,b)$ and $0<\lambda<1$, we have $f(\lambda x+(1-\lambda)y)\leq \lambda f(x)+(1-\lambda)f(y)$ (Take line connecting $f(x)$ and $f(y)$)
We use $f(x)\geq f(y)+(x-y)f'(y)$, we replace $y=\sum_{i=1}^{n}\lambda_ix_i$ and $x=x_j$, we have
$$f(x_j)\geq f(\sum_{i=1}^{n}\lambda_ix_i)+(x_j-\sum_{i=1}^{n}\lambda_ix_i)f'(\sum_{i=1}^{n}\lambda_ix_i)$$
We sum all the inequalities, we have
$$
\begin{aligned}
\sum_{j=1}^{n}\lambda_j f(x_j)&\geq \sum_{j=1}^{n}\lambda_jf(\sum_{i=1}^{n}\lambda_ix_i)+\sum_{j=1}^{n}\lambda_j(x_j-\sum_{i=1}^{n}\lambda_ix_i)f'(\sum_{i=1}^{n}\lambda_ix_i)\\
&\geq \sum_{j=1}^{n}\lambda_jf(\sum_{i=1}^{n}\lambda_ix_i)+0\\
&=f(\sum_{j=1}^{n}\lambda_ix_j)
\end{aligned}
$$
#### Theorem 1.1.5
Shannon's source coding theorem
Let $(A,\mu)$ be an elementary information source and let $b$ be the size of the encoded alphabet $B$.
Then
$$
\frac{-\sum_{x\in A}\mu(x)\log\mu(x)}{\log b}\leq L(\mu)<\frac{-\sum_{x\in A}\mu(x)\log\mu(x)}{\log b}+1
$$
where $L(\mu)$ is the minimum mean code word length of all uniquely decipherable codes for $(A,\mu)$.
Proof:
First, we show that
$$
\sum_{a\in A}m(a)\mu(a)\geq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)
$$
Let $m:A\to \mathbb{Z}_+$ be a map satisfying
$$
\sum_{a\in A}b^{-m(a)}\leq 1
$$
We defined the probability distribution $v$ on $A$ as
$$
v(x)=\frac{b^{-m(x)}}{\sum_{a\in A}b^{-m(a)}}=\frac{b^{-m(x)}}{T}
$$
Write $T=\sum_{a\in A}b^{-m(a)}$ so that $T\leq 1$.
Since $b^{-m(a)}=T\cdot v(a)$,
$-m(a)\log b=\log T+\log v(a)$, $m(a)=-\frac{\log T}{\log b}-\frac{\log v(a)}{\log b}$
We have
$$
\begin{aligned}
\sum_{a\in A}m(a)\mu(a)&=\sum_{a\in A}\left(-\frac{\log T}{\log b}-\frac{\log v(a)}{\log b}\right)\mu(a)\\
&=-\frac{\log T}{\log b}\sum_{a\in A}\mu(a)-\frac{1}{\log b}\sum_{a\in A}\mu(a)\log v(a)\\
&=(\log b)^{-1}\left\{-\log T - \sum_{a\in A}\mu(a)\log v(a)\right\}
\end{aligned}
$$
Without loss of generality, we assume that $\mu(a)>0$ for all $a\in A$.
$$
\begin{aligned}
-\sum_{a\in A}\mu(a)\log v(a)&=-\sum_{a\in A}\mu(a)(\log \mu(a)+\log v(a)-\log \mu(a))\\
&=-\sum_{a\in A}\mu(a)\log \mu(a)-\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}\\
&=-\sum_{a\in A}\mu(a)\log \mu(a)-\log \prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}
\end{aligned}
$$
$\log \prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}=\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}$ is also called KullbackLeibler divergence or relative entropy.
Since $\log$ is a concave function, by Jensen's inequality $f(\sum_{i=1}^{n}\lambda_ix_i)\leq \sum_{i=1}^{n}\lambda_if(x_i)$, we have
$$
\begin{aligned}
\sum_{a\in A}\mu(a)\log \frac{v(a)}{\mu(a)}&\leq\log\left(\sum_{a\in A}\mu(a) \frac{v(a)}{\mu(a)}\right)\\
\log\left(\prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}\right)&\leq \log\left(\sum_{a\in A}\mu(a) \frac{v(a)}{\mu(a)}\right)\\
\prod_{a\in A}\left(\frac{v(a)}{\mu(a)}\right)^{\mu(a)}&\leq \sum_{a\in A}\mu(a)\frac{v(a)}{\mu(a)}=1
\end{aligned}
$$
So
$$
-\sum_{a\in A}\mu(a)\log v(a)\geq -\sum_{a\in A}\mu(a)\log \mu(a)
$$
(This is also known as Gibbs' inequality: Put in words, the information entropy of a distribution $P$ is less than or equal to its cross entropy with any other distribution $Q$.)
Since $T\leq 1$, $-\log T\geq 0$, we have
$$
\sum_{a\in A}m(a)\mu(a)\geq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)
$$
Second, we show that
$$
\sum_{a\in A}m(a)\mu(a)\leq -(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)+1
$$
By Theorem 1.1.1, there exists an irreducible code $f:A\to S(B)$ such that $l(f(a))=m(a)$ for all $a\in A$.
So
$$
\begin{aligned}
\overline{l}(\mu,f)&=\sum_{a\in A}\mu(a)m(a)\\
&<\sum_{a\in A}\mu(a)\left(1-\frac{\log\mu(a)}{\log b}\right)\\
&=-\sum_{a\in A}\mu(a)\log\mu(a)\cdot\frac{1}{\log b}+1\\
&=-(\log b)^{-1}\sum_{a\in A}\mu(a)\log \mu(a)+1
\end{aligned}
$$
QED
### Entropy
Let $A$ be an alphabet and let $\mathcal{P}(A)$ be the set of all probability distributions on $A$. Any element $\mu\in \mathcal{P}(A)$ is thus a map from $A$ to $[0,1]$ such that $\sum_{a\in A}\mu(a)=1$.
Thus $\mathcal{P}(A)$ is a compact conves subset of real linear space $\mathbb{R}^A$.
We define the map $H:\mathcal{P}(A)\to\mathbb{R}$ as
$$
H(\mu)=-\sum_{a\in A}\mu(a)\log_2\mu(a)
$$
#### Basic properties of $H$
1. $H(\mu)\geq 0$
2. $H(\mu)=0$ if and only if $\mu$ is a point mass.
3. $H(\mu)=\log_2|A|$ if and only if $\mu$ is uniform.
### Huffman's coding algorithm
Huffman's coding algorithm is a greedy algorithm that constructs an optimal prefix code for a given probability distribution.
The algorithm is as follows:
1. Sort the symbols by their probabilities in descending order.
2. Merge the two symbols with the smallest probabilities and assign a new codeword to the merged symbol.
3. Repeat step 2 until there is only one symbol left.
```python
import heapq
def huffman(frequencies):
class Node:
def __init__(self, symbol=None, freq=0, left=None, right=None):
self.symbol = symbol
self.freq = freq
self.left = left
self.right = right
def __lt__(self, other): # For priority queue
return self.freq < other.freq
def is_leaf(self):
return self.symbol is not None
# Build the Huffman tree
heap = [Node(sym, freq) for sym, freq in frequencies.items()]
heapq.heapify(heap)
while len(heap) > 1:
left = heapq.heappop(heap)
right = heapq.heappop(heap)
merged = Node(freq=left.freq + right.freq, left=left, right=right)
heapq.heappush(heap, merged)
root = heap[0]
# Assign codes by traversing the tree
codebook = {}
def assign_codes(node, code=""):
if node.is_leaf():
codebook[node.symbol] = code
else:
assign_codes(node.left, code + "0")
assign_codes(node.right, code + "1")
assign_codes(root)
# Helper to pretty print the binary tree
def tree_repr(node, prefix=""):
if node.is_leaf():
return f"{prefix}{repr(node.symbol)} ({node.freq})\n"
else:
s = f"{prefix}• ({node.freq})\n"
s += tree_repr(node.left, prefix + " 0 ")
s += tree_repr(node.right, prefix + " 1 ")
return s
print("Huffman Codebook:")
for sym in sorted(codebook, key=lambda s: frequencies[s], reverse=True):
print(f"{repr(sym)}: {codebook[sym]}")
print("\nCoding Tree:")
print(tree_repr(root))
return codebook
```
#### Definition 1.2.1
Let $A=\{a_1,a_2,\cdots,a_n\}$ be an alphabet and let $\mu=(\mu(a_1),\mu(a_2),\cdots,\mu(a_n))$ be a probability distribution on $A$.
Proof:
We use mathematical induction on the number of symbols $n$.
Base Case: $n=2$
Only two symbols assign one "0", the other "1". This is clearly optimal (only possible choice).
Inductive Step:
Assume that Huffman's algorithm produces an optimal code for $n-1$ symbols.
Now, given $n$ symbols, Huffman merges the two least probable symbols $a,b$ into $c$, and constructs an optimal code recursively for $n-1$ symbols.
n1 symbols.
By the inductive hypothesis, the code on $A'$ is optimal.
is optimal.
By Step 2 above, assigning the two merged symbols $a$ and $b$ codewords $w_0$ and $w_1$ (based on
1
$w_1$ (based on $c$'s codeword $w$) results in the optimal solution for $A$.
Therefore, by induction, Huffmans algorithm gives an optimal prefix code for any $n$.
QED

5
pages/Math401/index.md Normal file
View File

@@ -0,0 +1,5 @@
# Math 401
This is a course about symmetrical group and bunch of applications in other fields of math.
Prof. Renado Fere is teaching this course.