# CSE559A Lecture 20 ## Local feature descriptors Detection: Identify the interest points Description: Extract vector feature descriptor surrounding each interest point. Matching: Determine correspondence between descriptors in two views ### Image representation Histogram of oriented gradients (HOG) - Quantization - Grids: fast but applicable only with few dimensions - Clustering: slower but can quantize data in higher dimensions - Matching - Histogram intersection or Euclidean may be faster - Chi-squared often works better - Earth mover’s distance is good for when nearby bins represent similar values #### SIFT vector formation Computed on rotated and scaled version of window according to computed orientation & scale - resample the window Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff) 4x4 array of gradient orientation histogram weighted by magnitude 8 orientations x 4x4 array = 128 dimensions Motivation: some sensitivity to spatial layout, but not too much. For matching: - Extraordinarily robust detection and description technique - Can handle changes in viewpoint - Up to about 60 degree out-of-plane rotation - Can handle significant changes in illumination - Sometimes even day vs. night - Fast and efficient—can run in real time - Lots of code available #### SURF - Fast approximation of SIFT idea - Efficient computation by 2D box filters & integral images - 6 times faster than SIFT - Equivalent quality for object identification #### Shape context ![Shape context descriptor](https://notenextra.trance-0.com/CSE559A/Shape_context_descriptor.png) #### Self-similarity Descriptor ![Self-similarity descriptor](https://notenextra.trance-0.com/CSE559A/Self-similarity_descriptor.png) ## Local feature matching ### Matching Simplest approach: Pick the nearest neighbor. Threshold on absolute distance Problem: Lots of self similarity in many photos Solution: Nearest neighbor with low ratio test ![Comparison of keypoint detectors](https://notenextra.trance-0.com/CSE559A/Comparison_of_keypoint_detectors.png) ## Deep Learning for Correspondence Estimation ![Deep learning for correspondence estimation](https://notenextra.trance-0.com/CSE559A/Deep_learning_for_correspondence_estimation.png) ## Optical Flow ### Field Motion field: the projection of the 3D scene motion into the image Magnitude of vectors is determined by metric motion Only caused by motion Optical flow: the apparent motion of brightness patterns in the image Magnitude of vectors is measured in pixels Can be caused by lightning ### Brightness constancy constraint, aperture problem Machine Learning Approach - Collect examples of inputs and outputs - Design a prediction model suitable for the task - Invariances, Equivariances; Complexity; Input and Output shapes and semantics - Specify loss functions and train model - Limitations: Requires training the model; Requires a sufficiently complete training dataset; Must re-learn known facts; Higher computational complexity Optimization Approach - Define properties we expect to hold for a correct solution - Translate properties into a cost function - Derive an algorithm to solve for the cost function - Limitations: Often requires making overly simple assumptions on properties; Some tasks can’t be easily defined Given frames at times $t-1$ and $t$, estimate the apparent motion field $u(x,y)$ and $v(x,y)$ between them Brightness constancy constraint: projection of the same point looks the same in every frame $$ I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t) $$ Additional assumptions: - Small motion: points do not move very far - Spatial coherence: points move like their neighbors Trick for solving: Brightness constancy constraint: $$ I(x,y,t-1) = I(x+u(x,y),y+v(x,y),t) $$ Linearize the right-hand side using Taylor expansion: $$ I(x,y,t-1) \approx I(x,y,t) + I_x u(x,y) + I_y v(x,y) $$ $$ I_x u(x,y) + I_y v(x,y) + I(x,y,t) - I(x,y,t-1) = 0 $$ Hence, $$ I_x u(x,y) + I_y v(x,y) + I_t = 0 $$