diff --git a/pages/CSE559A/CSE559A_L3.md b/pages/CSE559A/CSE559A_L3.md index e69de29..431e696 100644 --- a/pages/CSE559A/CSE559A_L3.md +++ b/pages/CSE559A/CSE559A_L3.md @@ -0,0 +1,357 @@ +# Lecture 3 + +## Image formation + +### Degrees of Freedom + +$$ +x=K[R|t]X +$$ + +$$ +w\begin{bmatrix} +x\\ +y\\ +1 +\end{bmatrix} += +\begin{bmatrix} + \alpha & s & u_0 \\ + 0 & \beta & v_0 \\ + 0 & 0 & 1 +\end{bmatrix} +\begin{bmatrix} +r_{11} & r_{12} & r_{13} &t_x\\ +r_{21} & r_{22} & r_{23} &t_y\\ +r_{31} & r_{32} & r_{33} &t_z\\ +\end{bmatrix} +\begin{bmatrix} +x\\ +y\\ +z\\ +1 +\end{bmatrix} +$$ + +### Impact of translation of camera + +$$ +p=K[R|t]\begin{bmatrix} +x\\ +y\\ +z\\ +0 +\end{bmatrix}=K[R]\begin{bmatrix} +x\\ +y\\ +z\\ +\end{bmatrix} +$$ + +Projection of a vanishing point or projection of a point at infinity is invariant to translation. + +### Recover world coordinates from pixel coordinates + +$$ +\begin{bmatrix} +u\\ +v\\ +1 +\end{bmatrix}=K[R|t]^{-1}X +$$ + +Key issue: where is the world origin $w$? Suppose $w=1/s$ + +$$ +\begin{aligned} + \begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix} + &=sK[R|t]X\\ + K^{-1}\begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix} + &=s[R|t]X\\ + R^{-1}K^{-1}\begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix}&=s[I|R^{-1}t]X\\ + R^{-1}K^{-1}\begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix}&=[I|R^{-1}t]sX\\ + R^{-1}K^{-1}\begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix}&=sX+sR^{-1}t\\ + \frac{1}{s}R^{-1}K^{-1}\begin{bmatrix} + u\\ + v\\ + 1 + \end{bmatrix}-R^{-1}t&=X\\ +\end{aligned} +$$ + +## Projective Geometry + +### Orthographic Projection + +Special case of perspective projection when $f\to\infty$ + +- Distance for the center of projection is infinite +- Also called parallel projection +- Projection matrix is + +$$ +w\begin{bmatrix} +u\\ +v\\ +1 +\end{bmatrix}= +\begin{bmatrix} +f & 0 & 0 & 0\\ +0 & f & 0 & 0\\ +0 & 0 & 0 & s\\ +\end{bmatrix} +\begin{bmatrix} +x\\ +y\\ +z\\ +1 +\end{bmatrix} +$$ + +Continue in later part of the course + +## Image processing foundations + +### Motivation for image processing + +Representational Motivation: + +- We need more than raw pixel values + +Computational Motivation: + +- Many image processing operations must be run across many locations in a image +- A loop in python is slow +- High-level libraries reduce errors, developer time, and algorithm runtime +- Two common libraries: + - Torch+Torchvision: Focus on deep learning + - scikit-image: Focus on classical image processing algorithms + +### Operations on images + +#### Point operations + +Operations that are applied to one pixel at a time + +Negative image + +$$ +I_{neg}(x,y)=L-1-I(x,y) +$$ + +Power law transformation: + +$$ +I_{out}(x,y)=cI(x,y)^{\gamma} +$$ + +- $c$ is a constant +- $\gamma$ is the gamma value + +Contrast stretching + +use function to stretch the range of pixel values + +$$ +I_{out}(x,y)=f(I(x,y)) +$$ + +- $f$ is a function that stretches the range of pixel values + +Image histogram + +- Histogram of an image is a plot of the frequency of each pixel value + +Limitations: + +- No spatial information +- No information about the relationship between pixels + +#### Linear filtering in spatial domain + +Operations that are applied to a neighborhood at each position + +Used to: + +- Enhance image features + - Denoise, sharpen, resize +- Extract information about image structure + - Edge detection, corner detection, blob detection +- Detect image patterns + - Template matching +- Convolutional Neural Networks + +Image filtering + +Do dot product of the image with a kernel + +$$ +h[m,n]=\sum_{k=0}^{m-i}\sum_{l=0}^{n-i}g[k,l]f[m+k,n+l] +$$ + +```python +def filter2d(image, kernel): + """ + Apply a 2D filter to an image, do not use this in practice + """ + for i in range(image.shape[0]): + for j in range(image.shape[1]): + image[i, j] = np.dot(kernel, image[i-1:i+2, j-1:j+2]) + return image +``` + +Computational cost: $k^2mn$, assume $k$ is the size of the kernel and $m$ and $n$ are the dimensions of the image + +Do not use this in practice, use built-in functions instead. + +**Box filter** + +$$ +\frac{1}{9}\begin{bmatrix} +1 & 1 & 1\\ +1 & 1 & 1\\ +1 & 1 & 1 +\end{bmatrix} +$$ + +Smooths the image + +**Identity filter** + +$$ +\begin{bmatrix} +0 & 0 & 0\\ +0 & 1 & 0\\ +0 & 0 & 0 +\end{bmatrix} +$$ + +Does not change the image + +**Sharpening filter** + +$$ +\begin{bmatrix} +0 & 0 & 0 \\ +0 & 2 & 0 \\ +0 & 0 & 0 +\end{bmatrix}- +\begin{bmatrix} +1 & 1 & 1 \\ +1 & 1 & 1 \\ +1 & 1 & 1 +\end{bmatrix} +$$ + +Enhances the image edges + +**Vertical edge detection** + +$$ +\begin{bmatrix} +1 & 0 & -1 \\ +2 & 0 & -2 \\ +1 & 0 & -1 +\end{bmatrix} +$$ + +Detects vertical edges + +**Horizontal edge detection** + +$$ +\begin{bmatrix} +1 & 2 & 1 \\ +0 & 0 & 0 \\ +-1 & -2 & -1 +\end{bmatrix} +$$ + +Detects horizontal edges + +Key property: + +- Linear: + - `filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)` +- Scale invariant: + - `filter(I,af)=a*filter(I,f)` +- Shift invariant: + - `filter(I,shift(f))=shift(filter(I,f))` +- Commutative: + - `filter(I,f_1)*filter(I,f_2)=filter(I,f_2)*filter(I,f_1)` +- Associative: + - `filter(I,f_1)*(filter(I,f_2)*filter(I,f_3))=(filter(I,f_1)*filter(I,f_2))*filter(I,f_3)` +- Distributive: + - `filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)` +- Identity: + - `filter(I,f_0)=I` + +Important filter: + +**Gaussian filter** + +$$ +G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}} +$$ + +Smooths the image (Gaussian blur) + +Common mistake: Make filter too large, visualize the filter before applying it (make the value on the edge $3\sigma$) + +Properties of Gaussian filter: + +- Remove high frequency components +- Convolution with self is another Gaussian filter +- Separable kernel: + - `G(x,y)=G(x)G(y)` (factorable into the product of two 1D Gaussian filters) + +##### Filter Separability + +- Separable filter: + - `f(x,y)=f(x)f(y)` + +Example: + +$$ +\begin{bmatrix} +1 & 2 & 1 \\ +2 & 4 & 2 \\ +1 & 2 & 1 +\end{bmatrix}= +\begin{bmatrix} +1 \\ +2 \\ +1 +\end{bmatrix}\times +\begin{bmatrix} +1 & 2 & 1 +\end{bmatrix} +$$ + +Gaussian filter is separable + +$$ +G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}=G(x)G(y) +$$ + +This reduces the computational cost of the filter from $k^2mn$ to $2kmn$