NoteNextra-origin/content/CSE5519/CSE5519_A4.md

# CSE5519 Advances in Computer Vision (Topic A: 2025: Semantic Segmentation)

## Dual Semantic Guidance for Open Vocabulary Sematic segmentation

[link to the paper](https://openaccess.thecvf.com/content/CVPR2025/papers/Wang_Dual_Semantic_Guidance_for_Open_Vocabulary_Semantic_Segmentation_CVPR_2025_paper.pdf)

## Novelty in Dual Semantic Guidance

Use dual semantic guidance for semantic segmentation. For each mask, deploy clip like object detection to align the mask with text description.

> [!TIP]
>
> This paper proposed a generalizable semantic segmentation model with a CLIP-like image-text encoder to refine the mask prediction.
>
> However, I wonder how this model generalized to segment different faces of geometry and create a clear boundary between different objects and the background. In most cases, CLIP may not need complete image information to predict the object and can make a decision based on partial objects. If we have some novel objects containing features of two that might be out of CLIP's codebook, will the CLIP-alignment still work?