# CSE559A Lecture 14 ## Object Detection AP (Average Precision) ### Benchmarks #### PASCAL VOC Challenge 20 Challenge classes. CNN increases the accuracy of object detection. #### COCO dataset Common objects in context. Semantic segmentation. Every pixel is classified to tags. Instance segmentation. Every pixel is classified and grouped into instances. ### Object detection: outline Proposal generation Object recognition #### R-CNN Proposal generation Use CNN to extract features from proposals. with SVM to classify proposals. Use selective search to generate proposals. Use AlexNet finetuned on PASCAL VOC to extract features. Pros: - Much more accurate than previous approaches - Andy deep architecture can immediately be "plugged in" Cons: - Not a single end-to-end trainable system - Fine-tune network with softmax classifier (log loss) - Train post-hoc linear SVMs (hinge loss) - Train post-hoc bounding box regressors (least squares) - Training is slow 2000CNN passes for each image - Inference (detection) was slow #### Fast R-CNN Proposal generation Use CNN to extract features from proposals. ##### ROI pooling and ROI alignment ROI pooling: - Pooling is applied to the feature map. - Pooling is applied to the proposal. ROI alignment: - Align the proposal to the feature map. - Align the proposal to the feature map. Use bounding box regression to refine the proposal.