# CSE559A Lecture 14

## Object Detection

AP (Average Precision)

### Benchmarks

#### PASCAL VOC Challenge

20 Challenge classes.

CNN increases the accuracy of object detection.

#### COCO dataset

Common objects in context.

Semantic segmentation. Every pixel is classified to tags.

Instance segmentation. Every pixel is classified and grouped into instances.

### Object detection: outline

Proposal generation

Object recognition

#### R-CNN

Proposal generation

Use CNN to extract features from proposals.

with SVM to classify proposals.

Use selective search to generate proposals.

Use AlexNet finetuned on PASCAL VOC to extract features.

Pros:

- Much more accurate than previous approaches
- Andy deep architecture can immediately be "plugged in"

Cons:

- Not a single end-to-end trainable system
  - Fine-tune network with softmax classifier (log loss)
  - Train post-hoc linear SVMs (hinge loss)
  - Train post-hoc bounding box regressors (least squares)
- Training is slow 2000CNN passes for each image
- Inference (detection) was slow

#### Fast R-CNN

Proposal generation

Use CNN to extract features from proposals.

##### ROI pooling and ROI alignment

ROI pooling:

- Pooling is applied to the feature map.
- Pooling is applied to the proposal.

ROI alignment:

- Align the proposal to the feature map.
- Align the proposal to the feature map.

Use bounding box regression to refine the proposal.