This commit is contained in:
Zheyuan Wu
2025-09-25 00:18:46 -05:00
parent ffcf2d90eb
commit 2df7d6983e
2 changed files with 66 additions and 0 deletions

View File

@@ -1,2 +1,49 @@
# CSE5519 Advances in Computer Vision (Topic C: 2022: Neural Rendering)
Use function to approximate the scene.
## Block-NeRF: Scalable Large Scene Neural View Synthesis
[link to the paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Tancik_Block-NeRF_Scalable_Large_Scene_Neural_View_Synthesis_CVPR_2022_paper.pdf)
Extend NeRF with appearance embeddings and learned pose refinement.
Mask out moving objects.
Do individual NeRFs at each intersection of streets.
### Novelty in Block-NeRF
#### Block size and placement
place one Block-NeRF at each intersection of streets. Covering the intersection itself and 75% of the way until it converges to next intersection.
#### Appearance embeddings
Label each image with lighting conditions and weather conditions to improve the generalization of the model.
#### Learned pose refinement and exposure input
Add learnable weights to refine the input pose of the camera and also the exposure of the camera to counter the noise and different camera settings for training data.
#### Moving object masking
Mask out moving objects which violates the assumption of static scene.
#### Visibility predection
Additional MLP used to approximate the visibility of a sampled point.
#### Block-NeRF compositing
Use inverse distance weight for rendering from multiple NeRFs.
### Appearance matching
Additional training procedure that freeze model except the target and let the target NeRF train to fit the points on the intersection of target model and other freezed models.
> [!TIP]
>
> This paper shows a new way to scale up the NeRF model by using multiple NeRFs to approximate the scene and effectively produce high-quality consistent view synthesis. The appearance embedding shows the role of additional information to improve the generalization of the model and I'm impressed by the rendering result over different lighting conditions and weather conditions.
>
> However, as far as I know, we do have a lots of models that use unsupervised methods to produce those extra information. I wonder how the training could integrate with ViT or other pre-trained unsupervised models predicting the camera parameters and lighting conditions so that we can train the model over more diverse data instead of the fully labeled data over Waymo vehicles.

View File

@@ -1,2 +1,21 @@
# CSE5519 Advances in Computer Vision (Topic F: 2022: Representation Learning)
## Masked Autoencoders Are Scalable Vision Learners
[link to the paper](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf)
### Novelty in MAE
#### Masked Autoencoders
Masked Autoencoders are a type of autoencoders that mask out some of the input data and train the model to reconstruct the original data. For best performance, they mask out 75% of the input data.
A masked auto encoder with a single-block decoder can perform strongly with fine tuning.
This method speeds up the training process by a factor of 3-4
> [!TIP]
>
> This paper shows a new way to train a vision model by using masked autoencoders. The authors masked out 75% of the input data and train the model to reconstruct the original data by the insight that image data is highly redundant compared to text data when using transformer architecture.
>
> Currently, the sampling method is uniform and simple with surprising results. I wonder if we could use better sampling method, for example uniform sampling with information entropy on each patches would yield better results?