15 lines
1.0 KiB
Markdown
15 lines
1.0 KiB
Markdown
# CSE5519 Advances in Computer Vision (Topic A: 2025: Semantic Segmentation)
|
|
|
|
## Dual Semantic Guidance for Open Vocabulary Sematic segmentation
|
|
|
|
[link to the paper](https://openaccess.thecvf.com/content/CVPR2025/papers/Wang_Dual_Semantic_Guidance_for_Open_Vocabulary_Semantic_Segmentation_CVPR_2025_paper.pdf)
|
|
|
|
## Novelty in Dual Semantic Guidance
|
|
|
|
Use dual semantic guidance for semantic segmentation. For each mask, deploy clip like object detection to align the mask with text description.
|
|
|
|
> [!TIP]
|
|
>
|
|
> This paper proposed a generalizable semantic segmentation model with a CLIP-like image-text encoder to refine the mask prediction.
|
|
>
|
|
> However, I wonder how this model generalized to segment different faces of geometry and create a clear boundary between different objects and the background. In most cases, CLIP may not need complete image information to predict the object and can make a decision based on partial objects. If we have some novel objects containing features of two that might be out of CLIP's codebook, will the CLIP-alignment still work? |