Files
NoteNextra-origin/content/CSE559A/CSE559A_L20.md
2025-07-06 12:40:25 -05:00

146 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSE559A Lecture 20
## Local feature descriptors
Detection: Identify the interest points
Description: Extract vector feature descriptor surrounding each interest point.
Matching: Determine correspondence between descriptors in two views
### Image representation
Histogram of oriented gradients (HOG)
- Quantization
- Grids: fast but applicable only with few dimensions
- Clustering: slower but can quantize data in higher dimensions
- Matching
- Histogram intersection or Euclidean may be faster
- Chi-squared often works better
- Earth movers distance is good for when nearby bins represent similar values
#### SIFT vector formation
Computed on rotated and scaled version of window according to computed orientation & scale
- resample the window
Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)
4x4 array of gradient orientation histogram weighted by magnitude
8 orientations x 4x4 array = 128 dimensions
Motivation: some sensitivity to spatial layout, but not too much.
For matching:
- Extraordinarily robust detection and description technique
- Can handle changes in viewpoint
- Up to about 60 degree out-of-plane rotation
- Can handle significant changes in illumination
- Sometimes even day vs. night
- Fast and efficient—can run in real time
- Lots of code available
#### SURF
- Fast approximation of SIFT idea
- Efficient computation by 2D box filters & integral images
- 6 times faster than SIFT
- Equivalent quality for object identification
#### Shape context
![Shape context descriptor](https://notenextra.trance-0.com/CSE559A/Shape_context_descriptor.png)
#### Self-similarity Descriptor
![Self-similarity descriptor](https://notenextra.trance-0.com/CSE559A/Self-similarity_descriptor.png)
## Local feature matching
### Matching
Simplest approach: Pick the nearest neighbor. Threshold on absolute distance
Problem: Lots of self similarity in many photos
Solution: Nearest neighbor with low ratio test
![Comparison of keypoint detectors](https://notenextra.trance-0.com/CSE559A/Comparison_of_keypoint_detectors.png)
## Deep Learning for Correspondence Estimation
![Deep learning for correspondence estimation](https://notenextra.trance-0.com/CSE559A/Deep_learning_for_correspondence_estimation.png)
## Optical Flow
### Field
Motion field: the projection of the 3D scene motion into the image
Magnitude of vectors is determined by metric motion
Only caused by motion
Optical flow: the apparent motion of brightness patterns in the image
Magnitude of vectors is measured in pixels
Can be caused by lightning
### Brightness constancy constraint, aperture problem
Machine Learning Approach
- Collect examples of inputs and outputs
- Design a prediction model suitable for the task
- Invariances, Equivariances; Complexity; Input and Output shapes and semantics
- Specify loss functions and train model
- Limitations: Requires training the model; Requires a sufficiently complete training dataset; Must re-learn known facts; Higher computational complexity
Optimization Approach
- Define properties we expect to hold for a correct solution
- Translate properties into a cost function
- Derive an algorithm to solve for the cost function
- Limitations: Often requires making overly simple assumptions on properties; Some tasks cant be easily defined
Given frames at times $t-1$ and $t$, estimate the apparent motion field $u(x,y)$ and $v(x,y)$ between them
Brightness constancy constraint: projection of the same point looks the same in every frame
$$
I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)
$$
Additional assumptions:
- Small motion: points do not move very far
- Spatial coherence: points move like their neighbors
Trick for solving:
Brightness constancy constraint:
$$
I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)
$$
Linearize the right-hand side using Taylor expansion:
$$
I(x,y,t-1) \approx I(x,y,t) + I_x u(x,y) + I_y v(x,y)
$$
$$
I_x u(x,y) + I_y v(x,y) + I(x,y,t) - I(x,y,t-1) = 0
$$
Hence,
$$
I_x u(x,y) + I_y v(x,y) + I_t = 0
$$