upgrade structures and migrate to nextra v4

2025-07-06 12:40:25 -05:00
parent 76e50de44d
commit 717520624d
317 changed files with 18143 additions and 22777 deletions
--- a/content/CSE559A/CSE559A_L16.md
+++ b/content/CSE559A/CSE559A_L16.md
@@ -0,0 +1,114 @@
+# CSE559A Lecture 16
+
+## Dense image labelling
+
+### Semantic segmentation
+
+Use one-hot encoding to represent the class of each pixel.
+
+### General Network design
+
+Design a network with only convolutional layers, make predictions for all pixels at once.
+
+Can the network operate at full image resolution?
+
+Practical solution: first downsample, then upsample
+
+### Outline
+
+- Upgrading a Classification Network to Segmentation
+- Operations for dense prediction
+  - Transposed convolutions, unpooling
+- Architectures for dense prediction
+  - DeconvNet, U-Net, "U-Net"
+- Instance segmentation
+  - Mask R-CNN
+- Other dense prediction problems
+
+### Fully Convolutional Networks
+
+"upgrading" a classification network to a dense prediction network
+
+1. Covert "fully connected" layers to 1x1 convolutions
+2. Make the input image larger
+3. Upsample the output
+
+Start with an existing classification CNN ("an encoder")
+
+Then use bilinear interpolation and transposed convolutions to make full resolution.
+
+### Operations for dense prediction
+
+#### Transposed Convolutions
+
+Use the filter to "paint" in the output: place copies of the filter on the output, multiply by corresponding value in the input, sum where copies of the filter overlap
+
+We can increase the resolution of the output by using a larger stride in the convolution.
+
+- For stride 2, dilate the input by inserting rows and columns of zeros between adjacent entries, convolve with flipped filter
+- Sometimes called convolution with fractional input stride 1/2
+
+#### Unpooling
+
+Max unpooling:
+
+- Copy the maximum value in the input region to all locations in the output
+- Use the location of the maximum value to know where to put the value in the output
+
+Nearest neighbor unpooling:
+
+- Copy the maximum value in the input region to all locations in the output
+- Use the location of the maximum value to know where to put the value in the output
+
+### Architectures for dense prediction
+
+#### DeconvNet
+
+![DeconvNet](https://notenextra.trance-0.com/CSE559A/DeconvNet.png)
+
+_How the information about location is encoded in the network?_
+
+#### U-Net
+
+![U-Net](https://notenextra.trance-0.com/CSE559A/U-Net.png)
+
+- Like FCN, fuse upsampled higher-level feature maps with higher-res, lower-level feature maps (like residual connections)
+- Unlike FCN, fuse by concatenation, predict at the end
+
+#### Extended U-Net Architecture
+
+Many variants of U-Net would replace the "encoder" of the U-Net with other architectures.
+
+![Extended U-Net Architecture Example](https://notenextra.trance-0.com/CSE559A/ExU-Net.png)
+
+##### Encoder/Decoder v.s. U-Net
+
+![Encoder/Decoder v.s. U-Net](https://notenextra.trance-0.com/CSE559A/EncoderDecoder_vs_U-Net.png)
+
+### Instance Segmentation
+
+#### Mask R-CNN
+
+Mask R-CNN = Faster R-CNN + FCN on Region of Interest
+
+### Extend to keypoint prediction?
+
+- Use a similar architecture to Mask R-CNN
+
+_Continue on Tuesday_
+
+### Other tasks
+
+#### Panoptic feature pyramid network
+
+![Panoptic Feature Pyramid Network](https://notenextra.trance-0.com/CSE559A/Panoptic_Feature_Pyramid_Network.png)
+
+#### Depth and normal estimation
+
+![Depth and Normal Estimation](https://notenextra.trance-0.com/CSE559A/Depth_and_Normal_Estimation.png)
+
+D. Eigen and R. Fergus, Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV 2015
+
+#### Colorization
+
+R. Zhang, P. Isola, and A. Efros, Colorful Image Colorization, ECCV 2016