upgrade structures and migrate to nextra v4

This commit is contained in:
Zheyuan Wu
2025-07-06 12:40:25 -05:00
parent 76e50de44d
commit 717520624d
317 changed files with 18143 additions and 22777 deletions

View File

@@ -0,0 +1,357 @@
# CSE559A Lecture 3
## Image formation
### Degrees of Freedom
$$
x=K[R|t]X
$$
$$
w\begin{bmatrix}
x\\
y\\
1
\end{bmatrix}
=
\begin{bmatrix}
\alpha & s & u_0 \\
0 & \beta & v_0 \\
0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & r_{13} &t_x\\
r_{21} & r_{22} & r_{23} &t_y\\
r_{31} & r_{32} & r_{33} &t_z\\
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}
$$
### Impact of translation of camera
$$
p=K[R|t]\begin{bmatrix}
x\\
y\\
z\\
0
\end{bmatrix}=K[R]\begin{bmatrix}
x\\
y\\
z\\
\end{bmatrix}
$$
Projection of a vanishing point or projection of a point at infinity is invariant to translation.
### Recover world coordinates from pixel coordinates
$$
\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}=K[R|t]^{-1}X
$$
Key issue: where is the world origin $w$? Suppose $w=1/s$
$$
\begin{aligned}
\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}
&=sK[R|t]X\\
K^{-1}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}
&=s[R|t]X\\
R^{-1}K^{-1}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}&=s[I|R^{-1}t]X\\
R^{-1}K^{-1}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}&=[I|R^{-1}t]sX\\
R^{-1}K^{-1}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}&=sX+sR^{-1}t\\
\frac{1}{s}R^{-1}K^{-1}\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}-R^{-1}t&=X\\
\end{aligned}
$$
## Projective Geometry
### Orthographic Projection
Special case of perspective projection when $f\to\infty$
- Distance for the center of projection is infinite
- Also called parallel projection
- Projection matrix is
$$
w\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}=
\begin{bmatrix}
f & 0 & 0 & 0\\
0 & f & 0 & 0\\
0 & 0 & 0 & s\\
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}
$$
Continue in later part of the course
## Image processing foundations
### Motivation for image processing
Representational Motivation:
- We need more than raw pixel values
Computational Motivation:
- Many image processing operations must be run across many locations in a image
- A loop in python is slow
- High-level libraries reduce errors, developer time, and algorithm runtime
- Two common libraries:
- Torch+Torchvision: Focus on deep learning
- scikit-image: Focus on classical image processing algorithms
### Operations on images
#### Point operations
Operations that are applied to one pixel at a time
Negative image
$$
I_{neg}(x,y)=L-1-I(x,y)
$$
Power law transformation:
$$
I_{out}(x,y)=cI(x,y)^{\gamma}
$$
- $c$ is a constant
- $\gamma$ is the gamma value
Contrast stretching
use function to stretch the range of pixel values
$$
I_{out}(x,y)=f(I(x,y))
$$
- $f$ is a function that stretches the range of pixel values
Image histogram
- Histogram of an image is a plot of the frequency of each pixel value
Limitations:
- No spatial information
- No information about the relationship between pixels
#### Linear filtering in spatial domain
Operations that are applied to a neighborhood at each position
Used to:
- Enhance image features
- Denoise, sharpen, resize
- Extract information about image structure
- Edge detection, corner detection, blob detection
- Detect image patterns
- Template matching
- Convolutional Neural Networks
Image filtering
Do dot product of the image with a kernel
$$
h[m,n]=\sum_{k=0}^{m-i}\sum_{l=0}^{n-i}g[k,l]f[m+k,n+l]
$$
```python
def filter2d(image, kernel):
"""
Apply a 2D filter to an image, do not use this in practice
"""
for i in range(image.shape[0]):
for j in range(image.shape[1]):
image[i, j] = np.dot(kernel, image[i-1:i+2, j-1:j+2])
return image
```
Computational cost: $k^2mn$, assume $k$ is the size of the kernel and $m$ and $n$ are the dimensions of the image
Do not use this in practice, use built-in functions instead.
**Box filter**
$$
\frac{1}{9}\begin{bmatrix}
1 & 1 & 1\\
1 & 1 & 1\\
1 & 1 & 1
\end{bmatrix}
$$
Smooths the image
**Identity filter**
$$
\begin{bmatrix}
0 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 0
\end{bmatrix}
$$
Does not change the image
**Sharpening filter**
$$
\begin{bmatrix}
0 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 0
\end{bmatrix}-
\begin{bmatrix}
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end{bmatrix}
$$
Enhances the image edges
**Vertical edge detection**
$$
\begin{bmatrix}
1 & 0 & -1 \\
2 & 0 & -2 \\
1 & 0 & -1
\end{bmatrix}
$$
Detects vertical edges
**Horizontal edge detection**
$$
\begin{bmatrix}
1 & 2 & 1 \\
0 & 0 & 0 \\
-1 & -2 & -1
\end{bmatrix}
$$
Detects horizontal edges
Key property:
- Linear:
- `filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)`
- Scale invariant:
- `filter(I,af)=a*filter(I,f)`
- Shift invariant:
- `filter(I,shift(f))=shift(filter(I,f))`
- Commutative:
- `filter(I,f_1)*filter(I,f_2)=filter(I,f_2)*filter(I,f_1)`
- Associative:
- `filter(I,f_1)*(filter(I,f_2)*filter(I,f_3))=(filter(I,f_1)*filter(I,f_2))*filter(I,f_3)`
- Distributive:
- `filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)`
- Identity:
- `filter(I,f_0)=I`
Important filter:
**Gaussian filter**
$$
G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}
$$
Smooths the image (Gaussian blur)
Common mistake: Make filter too large, visualize the filter before applying it (make the value on the edge $3\sigma$)
Properties of Gaussian filter:
- Remove high frequency components
- Convolution with self is another Gaussian filter
- Separable kernel:
- `G(x,y)=G(x)G(y)` (factorable into the product of two 1D Gaussian filters)
##### Filter Separability
- Separable filter:
- `f(x,y)=f(x)f(y)`
Example:
$$
\begin{bmatrix}
1 & 2 & 1 \\
2 & 4 & 2 \\
1 & 2 & 1
\end{bmatrix}=
\begin{bmatrix}
1 \\
2 \\
1
\end{bmatrix}\times
\begin{bmatrix}
1 & 2 & 1
\end{bmatrix}
$$
Gaussian filter is separable
$$
G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}=G(x)G(y)
$$
This reduces the computational cost of the filter from $k^2mn$ to $2kmn$