Update CSE5519_A1.md

This commit is contained in:
Zheyuan Wu
2025-09-12 10:58:42 -05:00
parent 533ea36b37
commit dbfe60c936

View File

@@ -39,4 +39,17 @@ Use Multi-level feature aggregation to get the final segmentation map.
>
> This paper shows a remarkable success of transformer in semantic segmentation. The authors use linear projection split large images to mini patches to get the patch embeddings and then use a transformer encoder to get the final segmentation map.
>
> I'm really interested in the linear projection function $f$. How does it work to preserve the spatial information across the patches? What will happen if we have square frames overlapping the image? how doe the transformer encoder work to solve the occlusion problem or it is out of scope of the paper?
> I'm really interested in the linear projection function $f$. How does it work to preserve the spatial information across the patches? What will happen if we have square frames overlapping the image? how doe the transformer encoder work to solve the occlusion problem or it is out of scope of the paper?
### On lecture new takes
#### DeepLabv3+
Atrous convolutions (large receptive field)
Separable convolutions (depthwise convolutions)
#### SETR
Learned positional embeddings.