From dbfe60c936845057b6147aefc93162826607c1a0 Mon Sep 17 00:00:00 2001 From: Zheyuan Wu <60459821+Trance-0@users.noreply.github.com> Date: Fri, 12 Sep 2025 10:58:42 -0500 Subject: [PATCH] Update CSE5519_A1.md --- content/CSE5519/CSE5519_A1.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/content/CSE5519/CSE5519_A1.md b/content/CSE5519/CSE5519_A1.md index 01b8240..19e70a0 100644 --- a/content/CSE5519/CSE5519_A1.md +++ b/content/CSE5519/CSE5519_A1.md @@ -39,4 +39,17 @@ Use Multi-level feature aggregation to get the final segmentation map. > > This paper shows a remarkable success of transformer in semantic segmentation. The authors use linear projection split large images to mini patches to get the patch embeddings and then use a transformer encoder to get the final segmentation map. > -> I'm really interested in the linear projection function $f$. How does it work to preserve the spatial information across the patches? What will happen if we have square frames overlapping the image? how doe the transformer encoder work to solve the occlusion problem or it is out of scope of the paper? \ No newline at end of file +> I'm really interested in the linear projection function $f$. How does it work to preserve the spatial information across the patches? What will happen if we have square frames overlapping the image? how doe the transformer encoder work to solve the occlusion problem or it is out of scope of the paper? + +### On lecture new takes + +#### DeepLabv3+ + +Atrous convolutions (large receptive field) + +Separable convolutions (depthwise convolutions) + + +#### SETR + +Learned positional embeddings. \ No newline at end of file