This commit is contained in:
Trance-0
2025-09-18 12:44:53 -05:00
parent f62e2bb8e1
commit a6e92115f7
6 changed files with 369 additions and 2 deletions

View File

@@ -0,0 +1,78 @@
# CSE510 Deep Reinforcement Learning (Lecture 8)
## Convolutional Neural Networks
Another note in computer vision can be found here: [CSE559A Lecture 10](../CSE559A/CSE559A_L10#convolutional-layer)
Basically, it is a stack of different layers:
- Convolutional layer
- Non-linearity layer
- Pooling layer (or downsampling layer)
- Fully connected layer
### Convolutional layer
Filtering: The math behind the matching.
1. Line up the feature and the image patch.
2. Multiply each image pixel by the corresponding feature pixel.
3. Add them up.
4. Divide by the total number of pixels in the feature.
Idea of a convolutional neural network, in some sense, is to let the network "learn" the right filters for a specific task.
### Non-linearity Layer
> [!TIP]
>
> This is irrelevant to the lecture, but consider the following term:
>
> "Bounded rationality"
- Convolution is a linear operation
- Non-linearity layer creates an activation map from the feature map generated by the convolutional layer
- Consisting an activation function (an element-wise operation)
- Rectified linear units (ReLUs) is advantageous over the traditional sigmoid or tanh activation functions
### Pooling layer
Shrinking the Image Stack
- Motivation: the activation maps can be large
- Reducing the spacial size of the activation maps
- Often after multiple stages of other layers (i.e., convolutional and non-linear layers)
- Steps:
1. Pick a window size (usually 2 or 3).
2. Pick a stride (usually 2).
3. Walk your window across your filtered images.
4. From each window, take the maximum value.
Pros:
- Reducing the computational requirements
- Minimizing the likelihood of overfitting
Cons:
- Aggressive reduction can limit the depth of a network and ultimately limit the performance
### Fully connected layer
- Multilayer perceptron (MLP)
- Mapping the activation volume from previous layers into a class probability distribution
- Non-linearity is built in the neurons, instead of a separate layer
- Viewed as 1x1 convolution kernels
For classification: Output layer is a regular, fully connected layer with softmax non-linearity
- Output provides an estimate of the conditional probability of each class
> [!TIP]
>
> The golden triangle of machine learning:
>
> - Data
> - Algorithm
> - Computation

View File

@@ -10,4 +10,5 @@ export default {
CSE510_L5: "CSE510 Deep Reinforcement Learning (Lecture 5)",
CSE510_L6: "CSE510 Deep Reinforcement Learning (Lecture 6)",
CSE510_L7: "CSE510 Deep Reinforcement Learning (Lecture 7)",
CSE510_L8: "CSE510 Deep Reinforcement Learning (Lecture 8)",
}

View File

@@ -0,0 +1,267 @@
# CSE5313 Coding and information theory for data science (Lecture 8)
## Review on Linear codes
|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ | $3$ | Punctured Hadamard code | $2^{m-1}$|
## More on linear codes
### Extended Hamming code
Consider the Hamming code $[2^m-1,2^m-m-1,3]_{\mathbb{F}_2}$.
Extend it to a cod eof length $2^m$ by adding a parity bit.
Recall the hamming code $[7,4,3]_{2}$.
$$
H_{HAM}=
\begin{pmatrix}
1 & 0 & 0 & 1 & 1 & 0 & 1\\
0 & 1 & 0 & 1 & 0 & 1 & 1\\
0 & 0 & 1 & 0 & 1 & 1 & 1\\
\end{pmatrix}
$$
The extended Hamming code is:
$$
H_{EXT}=
\begin{pmatrix}
1 & 1 & 1 & 1 & 1 & 1 & 1 & 1\\
1 & 0 & 0 & 1 & 1 & 0 & 1 & 0\\
0 & 1 & 0 & 1 & 0 & 1 & 1 & 0\\
0 & 0 & 1 & 0 & 1 & 1 & 1 & 0\\
\end{pmatrix}
$$
The minimum distance of the extended Hamming code is 4.
<details>
<summary>Proof for minimum distance</summary>
It is sufficient to show that every 3 columns are linearly independent.
Using the [lemma for minimum distance](./CSE5313_L7#lemma-for-minimum-distance), we have that the minimum distance is 4.
Notice that in $\mathbb{F}_2$, multiplication is equivalent to AND operation.
By simple observations, we know that every 2 columns are linearly independent.
Consider the following linear combination:
$$
a_1\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_2\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}+a_3\begin{pmatrix}
1\\
\vdots\\
\end{pmatrix}=0
$$
Therefore, every 3 columns are linearly independent since the top row will always be 1. if $a_1,a_2,a_3$ are $1$.
Therefore, the minimum distance is 4.
</details>
For the dimension of this code, we have $k=2^m-m-1$, with total code length $m$.
### Augmented Hadamard code
Consider what is generated by the parity check matrix of the extended Hamming code.
Let $xH_{EXT}$
- If $xH_{EXT}=0$, then $(0,x_2,\ldots, x_m)H_{HAM}=(y,0)$, where $y$ is the punctured hadamard codeword.
- If $xH_{EXT}=1$, then $(1,x_2,\ldots, x_m)H_{HAM}=(1,1,\ldots,1)+(y,0)$, where $y$ is the punctured hadamard codeword.
Therefore, the code generated by $H_{EXT}$ is given by:
Let $\mathcal{C}$ be the hadamard code.
Let $\mathcal{C}+\mathbb{I}$ be its shift by the all 1's vector (flip all bits of all words).
Augmented Hadamard code = $\mathcal{C} \cup \mathcal{C} + \mathbb{I}$.
The length of the code is $2^m$.
The dimension of the code is $m+1$.
Since the code is still linear, the minimum distance is the minimum weight of the codewords.
So minimum distance of $\mathcal{C}+\mathbb{I}$ is $2^{m-1}$.
### Summary for simple linear codes
|The code |Dimension $k$ (effective message length) | Minimum distance $d$ | Dual code | Minimum distance of dual code|
|---------|--------------|----------------------|-----------|-----------------------------|
|$\mathbb{F}^n$| $n$ | $1$ | $\{0\}$ | $0$|
|Parity code| $n-1$ | $2$ | Repetition code | $n$|
|Hamming code| $2^m-m-1$ (length $2^m-1$) | $3$ | Punctured Hadamard code | $2^{m-1}$|
|Extended Hamming code| $2^m-m-1$ (length $2^m$) | $4$ | Augmented Hadamard code | $2^{m-1}$|
## Boundary of linear codes
Natural questions:
- Can we extend the table of linear codes infinitely?
- What set of configuration $(n,k,d)_q$ are impossible?
- What set of configuration $(n,k,d)_q$ are possible, even if we don't know how to construct them?
### Boundary I: Singleton bound
> Singleton is a name for the person who discovered this bound.
Theorem: For any linear code $\mathcal{C}\subseteq \mathbb{F}^n_q$, we have $d\leq n-k+1$.
<details>
<summary>Proof</summary>
Idea: Using the Pigeonhole principle.
Assume an code $[n,k,d]_q$ exists.
Pigeons: All $q^k$ possible code word of $\mathcal{C}$.
Holes: All $q^\ell$ values of the first $\ell$ entries of a codeword (for some $\ell<n$).
If $q^\ell<q^k$, then by Pigeonhole principle, there exists two codewords in $\mathcal{C}$ that agrees on the first $\ell$ entries.
So $d\leq n-\ell$.
So the largest $\ell$ value for which this arguments works is $n-k+1$.
</details>
#### Definition of Maximum Distance Separable (MDS) code
A code $\mathcal{C} = [n,k,d]_q$ with $d = n - k + 1$ is called a Maximum Distance Separable (MDS) code.
<details>
<summary>Examples for singleton bound</summary>
$\mathbb{F}^n$: any $n, k = n, d = 1$.
- Attains with equality!
Parity: any $n, k = n - 1, d = 2$.
- Attains with equality!
Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.
$n - k + 1 = m + 1 > 3$.
This creates some trade-off between $k$ and $d$.
</details>
### Boundary II: The Sphere Packing Bound
Let $r=\lfloor \frac{d-1}{2}\rfloor$, then $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
<details>
<summary>Proof</summary>
Let $c=(c_1,c_2,\ldots,c_n)\in \mathbb{F}^n_q$, and let $B(c,r)=\{y\in \mathbb{F}^n_q: d_H(c,y)\leq r\}$ for some $r\leq n$.
Computer $|B(c,r)|$.
$|B(c,0)|=1$
$|\{y\in \mathbb{F}^n_q: d_H(c,y)=1\}|=n(q-1)$.
$|\{y\in \mathbb{F}^n_q: d_H(c,y)=2\}|=\binom{n}{2}(q-1)^2=\frac{n(n-1)}{2}(q-1)^2$.
So, $|B(c,r)|=\sum_{i=0}^{r}\binom{n}{i}(q-1)^i$.
Recall that $\mathcal{C}$ of minimum distance $d$ if and only if $\forall c_1,c_2\in \mathcal{C}, B(c_1,\lfloor \frac{d-1}{2}\rfloor)\cap B(c_2,\lfloor \frac{d-1}{2}\rfloor)=\emptyset$.
Therefore, let $r=\lfloor \frac{d-1}{2}\rfloor$, we have $\sum_{i=0}^{r}\binom{n}{i}(q-1)^i\leq q^{n-k}$.
</details>
#### Definition for perfect code
A code $\mathcal{C}$ is called a perfect code if $|C|=q^{n-k}$.
<details>
<summary>Examples for sphere packing bound</summary>
Let $q=2$.
$\mathbb{F}_2^n$: any $n, k = n, d = 1$.
- $r = \frac{d-1}{2} = 0$.
- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-n} = 1$.
- Attained with equality!
Parity: any $n, k = n - 1, d = 2$.
- $r = \frac{d-1}{2} = 0$.
- $\Rightarrow \sum_{i=0}^{0}\binom{n}{i}(q-1)^i = 1 \leq q^{n-k} = 2^{n-k} = 2^{n-(n-1)} = 2$.
- $q \geq 2 \Rightarrow$ NOT attained with equality.
Exercise: Equality is attained for the repetition code (dual of parity) for odd $n$.
Hamming: $n = 2^m - 1, k = 2^m - m - 1, d = 3$.
- $r = \frac{d-1}{2} = 1$.
- $\Rightarrow \sum_{i=0}^{1}\binom{n}{i}(q-1)^i = 1 + (2^{m}-1) = 2^{m}$.
- $\Rightarrow q^{n-k} = 2^{m}$.
- Attained with equality!
• Attained with equality!
</details>
Fortunately, there are only **4** types of **binary linear perfect codes**:
- $\mathbb{F}^n$
- Repetition code
- Hamming code
- $[23,12,7]_2$ Golay code
### Boundary III: The Gilbert-Varshamov Bound
Let $n,k,d,q$ such that $V_q(n-k, d-2)\leq q^{n-k}$, then there exists an $[n,k,d]_q$ code.
> Singleton, sphere-packing provide **necessary** conditions for existence of codes.
>
> Are there **sufficient** conditions?
>
> Recall:
>
> - Lemma: The minimum distance of $\mathcal{C}$ is the maximum integer such that every $d-1$ columns of the parity-check matrix $H$ are linearly independent.
>
> Idea:
>
> - Construct $H$ column by column, ensuring that no dependencies occur.
Idea:
Construct $H$ column by column, ensuring that no dependencies occur.
Algorithm:
- Begin with $(n-k)\times (n-k)$ identity matrix.
- Assume we choose columns $h_1,h_2,\ldots,h_\ell$ (each $h_i$ is in $\mathbb{F}^n_q$)
- Then next column $h_{\ell}$ must not be in the space of any previous $d-2$ columns.
- $h_{\ell}$ cannot be written as $[h_1,h_2,\ldots,h_{\ell-1}]x^T$ for $x$ of Hamming weight at most $d-2$.
- So the ineligible candidates for $h_{\ell}$ is:
- $B_{\ell-1}(0,d-2)=\{x\in \mathbb{F}^{\ell-1}_q: d_H(0,x)\leq d-2\}$.
- $|B_{\ell-1}(0,d-2)|=\sum_{i=0}^{d-2}\binom{\ell-1}{i}(q-1)^i$, denoted by $V_q(\ell-1, d-2)$.
- So the candidates for $h_{\ell}$ is:
- Out of which $V_q(\ell-1, d-2)$ are ineligible.
- Need $n$ columns overall, so we need $V_q(n-k, d-2)\leq q^{n-k}$.

View File

@@ -2,4 +2,20 @@
## Pixel-Perfect Structure-from-Motion with Featuremetric Refinement
[link to the paper](https://arxiv.org/pdf/2108.08291)
[link to the paper](https://arxiv.org/pdf/2108.08291)
Leveraging dense local information to refine sparse observations. It is inherently amenable to SfM as it can optimize all locations over multiple views in a track simultaneously.
Both **bundle** and **keypoint adjustments** are based on geometric observations, namely keypoint locations and flow, but do not account for their respective uncertainties.
Learned representation: SfM can handle image collections with unconstrained viewing conditions exhibiting large changes in terms of illumination, resolution, or camera models. The image representation used should be robust to such changes and ensure an accurate refinement in any condition. We thus turn to features computed by deep CNNs, which can exhibit high invariance by capturing a large context, yet retain fine local details.
> [!TIP]
>
> This paper is a good example of how to use deep features for SfM with CNN and do the bundle adjustment and keypoint adjustment over the predicted features for better results.
>
> It seems to be techniques behind the scene of the first topic that interests me when I joined the computer vision class. The collection of cameras and predicted cloud points really impressed me.
>
> With RANSAC and subpixel estimation we have pretty decent results for 3D reconstruction that is scalable with noise detection.
>
> I'm a bit curious about the performance of the model in more complicated scenes like structures of a tree or other natural scenes. How the model deal with high frequency details if we fit the "smooth surface" too much?

View File

@@ -3,3 +3,9 @@
## Does Object Recognition Work for Everyone?
[link to the paper](https://arxiv.org/pdf/1906.02659)
> [!TIP]
>
> This paper is a good example revealing the data bias problem in computer vision. I was a bit shocked by the fact that the data is not proportional to the real world human distribution.
>
> I'm a bit curious about is there any "fair" dataset that is available to the general public today? What are the metrics used to evaluate the fairness of the dataset?

View File

@@ -1,4 +1,3 @@
export default {
index: "Math 401, Fall 2025: Overview of thesis",
Math401_S1: "Math 401, Fall 2025: Thesis notes, Section 1",
}