# CSE559A Lecture 20

## Local feature descriptors

Detection: Identify the interest points

Description: Extract vector feature descriptor surrounding each interest point.

Matching: Determine correspondence between descriptors in two views

### Image representation

Histogram of oriented gradients (HOG)

- Quantization
  - Grids: fast but applicable only with few dimensions
  - Clustering: slower but can quantize data in higher dimensions
- Matching
  - Histogram intersection or Euclidean may be faster
  - Chi-squared often works better
  - Earth mover’s distance is good for when nearby bins represent similar values

#### SIFT vector formation

Computed on rotated and scaled version of window according to computed orientation & scale

- resample the window

Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)

4x4 array of gradient orientation histogram weighted by magnitude

8 orientations x 4x4 array = 128 dimensions

Motivation:  some sensitivity to spatial layout, but not too much.

For matching:

- Extraordinarily robust detection and description technique
- Can handle changes in viewpoint
  - Up to about 60 degree out-of-plane rotation
- Can handle significant changes in illumination
  - Sometimes even day vs. night
- Fast and efficient—can run in real time
- Lots of code available

#### SURF

- Fast approximation of SIFT idea
- Efficient computation by 2D box filters & integral images
  - 6 times faster than SIFT
- Equivalent quality for object identification

#### Shape context

![Shape context descriptor](https://notenextra.trance-0.com/CSE559A/Shape_context_descriptor.png)

#### Self-similarity Descriptor

![Self-similarity descriptor](https://notenextra.trance-0.com/CSE559A/Self-similarity_descriptor.png)

## Local feature matching

### Matching

Simplest approach: Pick the nearest neighbor. Threshold on absolute distance

Problem: Lots of self similarity in many photos

Solution: Nearest neighbor with low ratio test

![Comparison of keypoint detectors](https://notenextra.trance-0.com/CSE559A/Comparison_of_keypoint_detectors.png)
 
## Deep Learning for Correspondence Estimation

![Deep learning for correspondence estimation](https://notenextra.trance-0.com/CSE559A/Deep_learning_for_correspondence_estimation.png)

## Optical Flow

### Field

Motion field: the projection of the 3D scene motion into the image
Magnitude of vectors is determined by metric motion
Only caused by motion

Optical flow: the apparent motion of brightness patterns in the image
Magnitude of vectors is measured in pixels
Can be caused by lightning

### Brightness constancy constraint, aperture problem

Machine Learning Approach

- Collect examples of inputs and outputs
- Design a prediction model suitable for the task
  - Invariances, Equivariances; Complexity; Input and Output shapes and semantics
- Specify loss functions and train model
- Limitations: Requires training the model; Requires a sufficiently complete training dataset; Must re-learn known facts; Higher computational complexity

Optimization Approach

- Define properties we expect to hold for a correct solution
- Translate properties into a cost function
- Derive an algorithm to solve for the cost function
- Limitations: Often requires making overly simple assumptions on properties; Some tasks can’t be easily defined

Given frames at times $t-1$ and $t$, estimate the apparent motion field $u(x,y)$ and $v(x,y)$ between them
Brightness constancy constraint: projection of the same point looks the same in every frame

$$
I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)
$$

Additional assumptions:

- Small motion: points do not move very far
- Spatial coherence: points move like their neighbors

Trick for solving:

Brightness constancy constraint:

$$
I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t)
$$

Linearize the right-hand side using Taylor expansion:

$$
I(x,y,t-1) \approx I(x,y,t) + I_x u(x,y) + I_y v(x,y)
$$

$$
I_x u(x,y) + I_y v(x,y) + I(x,y,t) - I(x,y,t-1) = 0
$$

Hence,

$$
I_x u(x,y) + I_y v(x,y) + I_t = 0
$$