Files
NoteNextra-origin/content/CSE559A/CSE559A_L3.md
2025-07-06 12:40:25 -05:00

5.8 KiB

CSE559A Lecture 3

Image formation

Degrees of Freedom


x=K[R|t]X

w\begin{bmatrix}
x\\
y\\
1
\end{bmatrix}
=
\begin{bmatrix}
    \alpha & s & u_0 \\
    0 & \beta & v_0 \\
    0 & 0 & 1
\end{bmatrix}
\begin{bmatrix}
r_{11} & r_{12} & r_{13} &t_x\\
r_{21} & r_{22} & r_{23} &t_y\\
r_{31} & r_{32} & r_{33} &t_z\\
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}

Impact of translation of camera


p=K[R|t]\begin{bmatrix}
x\\
y\\
z\\
0
\end{bmatrix}=K[R]\begin{bmatrix}
x\\
y\\
z\\
\end{bmatrix}

Projection of a vanishing point or projection of a point at infinity is invariant to translation.

Recover world coordinates from pixel coordinates


\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}=K[R|t]^{-1}X

Key issue: where is the world origin w? Suppose w=1/s


\begin{aligned}
    \begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}
    &=sK[R|t]X\\
    K^{-1}\begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}
    &=s[R|t]X\\
    R^{-1}K^{-1}\begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}&=s[I|R^{-1}t]X\\
    R^{-1}K^{-1}\begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}&=[I|R^{-1}t]sX\\
    R^{-1}K^{-1}\begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}&=sX+sR^{-1}t\\
    \frac{1}{s}R^{-1}K^{-1}\begin{bmatrix}
        u\\
        v\\
        1
    \end{bmatrix}-R^{-1}t&=X\\
\end{aligned}

Projective Geometry

Orthographic Projection

Special case of perspective projection when f\to\infty

  • Distance for the center of projection is infinite
  • Also called parallel projection
  • Projection matrix is

w\begin{bmatrix}
u\\
v\\
1
\end{bmatrix}=
\begin{bmatrix}
f & 0 & 0 & 0\\
0 & f & 0 & 0\\
0 & 0 & 0 & s\\
\end{bmatrix}
\begin{bmatrix}
x\\
y\\
z\\
1
\end{bmatrix}

Continue in later part of the course

Image processing foundations

Motivation for image processing

Representational Motivation:

  • We need more than raw pixel values

Computational Motivation:

  • Many image processing operations must be run across many locations in a image
  • A loop in python is slow
  • High-level libraries reduce errors, developer time, and algorithm runtime
  • Two common libraries:
    • Torch+Torchvision: Focus on deep learning
    • scikit-image: Focus on classical image processing algorithms

Operations on images

Point operations

Operations that are applied to one pixel at a time

Negative image


I_{neg}(x,y)=L-1-I(x,y)

Power law transformation:


I_{out}(x,y)=cI(x,y)^{\gamma}
  • c is a constant
  • \gamma is the gamma value

Contrast stretching

use function to stretch the range of pixel values


I_{out}(x,y)=f(I(x,y))
  • f is a function that stretches the range of pixel values

Image histogram

  • Histogram of an image is a plot of the frequency of each pixel value

Limitations:

  • No spatial information
  • No information about the relationship between pixels

Linear filtering in spatial domain

Operations that are applied to a neighborhood at each position

Used to:

  • Enhance image features
    • Denoise, sharpen, resize
  • Extract information about image structure
    • Edge detection, corner detection, blob detection
  • Detect image patterns
    • Template matching
  • Convolutional Neural Networks

Image filtering

Do dot product of the image with a kernel


h[m,n]=\sum_{k=0}^{m-i}\sum_{l=0}^{n-i}g[k,l]f[m+k,n+l]
def filter2d(image, kernel):
    """
    Apply a 2D filter to an image, do not use this in practice
    """
    for i in range(image.shape[0]):
        for j in range(image.shape[1]):
            image[i, j] = np.dot(kernel, image[i-1:i+2, j-1:j+2])
    return image

Computational cost: k^2mn, assume k is the size of the kernel and m and n are the dimensions of the image

Do not use this in practice, use built-in functions instead.

Box filter


\frac{1}{9}\begin{bmatrix}
1 & 1 & 1\\
1 & 1 & 1\\
1 & 1 & 1
\end{bmatrix}

Smooths the image

Identity filter


\begin{bmatrix}
0 & 0 & 0\\
0 & 1 & 0\\
0 & 0 & 0
\end{bmatrix}

Does not change the image

Sharpening filter


\begin{bmatrix}
0 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 0
\end{bmatrix}-
\begin{bmatrix}
1 & 1 & 1 \\
1 & 1 & 1 \\
1 & 1 & 1
\end{bmatrix}

Enhances the image edges

Vertical edge detection


\begin{bmatrix}
1 & 0 & -1 \\
2 & 0 & -2 \\
1 & 0 & -1
\end{bmatrix}

Detects vertical edges

Horizontal edge detection


\begin{bmatrix}
1 & 2 & 1 \\
0 & 0 & 0 \\
-1 & -2 & -1
\end{bmatrix}

Detects horizontal edges

Key property:

  • Linear:
    • filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)
  • Scale invariant:
    • filter(I,af)=a*filter(I,f)
  • Shift invariant:
    • filter(I,shift(f))=shift(filter(I,f))
  • Commutative:
    • filter(I,f_1)*filter(I,f_2)=filter(I,f_2)*filter(I,f_1)
  • Associative:
    • filter(I,f_1)*(filter(I,f_2)*filter(I,f_3))=(filter(I,f_1)*filter(I,f_2))*filter(I,f_3)
  • Distributive:
    • filter(I,f_1+f_2)=filter(I,f_1)+filter(I,f_2)
  • Identity:
    • filter(I,f_0)=I

Important filter:

Gaussian filter


G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}

Smooths the image (Gaussian blur)

Common mistake: Make filter too large, visualize the filter before applying it (make the value on the edge 3\sigma)

Properties of Gaussian filter:

  • Remove high frequency components
  • Convolution with self is another Gaussian filter
  • Separable kernel:
    • G(x,y)=G(x)G(y) (factorable into the product of two 1D Gaussian filters)
Filter Separability
  • Separable filter:
    • f(x,y)=f(x)f(y)

Example:


\begin{bmatrix}
1 & 2 & 1 \\
2 & 4 & 2 \\
1 & 2 & 1
\end{bmatrix}=
\begin{bmatrix}
1 \\
2 \\
1
\end{bmatrix}\times
\begin{bmatrix}
1 & 2 & 1
\end{bmatrix}

Gaussian filter is separable


G(x,y)=\frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}=G(x)G(y)

This reduces the computational cost of the filter from k^2mn to 2kmn