From 2df7d6983eacf1caa34c8419bcbcc4f023e7c436 Mon Sep 17 00:00:00 2001 From: Zheyuan Wu <60459821+Trance-0@users.noreply.github.com> Date: Thu, 25 Sep 2025 00:18:46 -0500 Subject: [PATCH] updates --- content/CSE5519/CSE5519_C2.md | 47 +++++++++++++++++++++++++++++++++++ content/CSE5519/CSE5519_F2.md | 19 ++++++++++++++ 2 files changed, 66 insertions(+) diff --git a/content/CSE5519/CSE5519_C2.md b/content/CSE5519/CSE5519_C2.md index 296b4fa..c71391f 100644 --- a/content/CSE5519/CSE5519_C2.md +++ b/content/CSE5519/CSE5519_C2.md @@ -1,2 +1,49 @@ # CSE5519 Advances in Computer Vision (Topic C: 2022: Neural Rendering) +Use function to approximate the scene. + +## Block-NeRF: Scalable Large Scene Neural View Synthesis + +[link to the paper](https://openaccess.thecvf.com/content/CVPR2022/papers/Tancik_Block-NeRF_Scalable_Large_Scene_Neural_View_Synthesis_CVPR_2022_paper.pdf) + +Extend NeRF with appearance embeddings and learned pose refinement. + +Mask out moving objects. + +Do individual NeRFs at each intersection of streets. + +### Novelty in Block-NeRF + +#### Block size and placement + +place one Block-NeRF at each intersection of streets. Covering the intersection itself and 75% of the way until it converges to next intersection. + +#### Appearance embeddings + +Label each image with lighting conditions and weather conditions to improve the generalization of the model. + +#### Learned pose refinement and exposure input + +Add learnable weights to refine the input pose of the camera and also the exposure of the camera to counter the noise and different camera settings for training data. + +#### Moving object masking + +Mask out moving objects which violates the assumption of static scene. + +#### Visibility predection + +Additional MLP used to approximate the visibility of a sampled point. + +#### Block-NeRF compositing + +Use inverse distance weight for rendering from multiple NeRFs. + +### Appearance matching + +Additional training procedure that freeze model except the target and let the target NeRF train to fit the points on the intersection of target model and other freezed models. + +> [!TIP] +> +> This paper shows a new way to scale up the NeRF model by using multiple NeRFs to approximate the scene and effectively produce high-quality consistent view synthesis. The appearance embedding shows the role of additional information to improve the generalization of the model and I'm impressed by the rendering result over different lighting conditions and weather conditions. +> +> However, as far as I know, we do have a lots of models that use unsupervised methods to produce those extra information. I wonder how the training could integrate with ViT or other pre-trained unsupervised models predicting the camera parameters and lighting conditions so that we can train the model over more diverse data instead of the fully labeled data over Waymo vehicles. diff --git a/content/CSE5519/CSE5519_F2.md b/content/CSE5519/CSE5519_F2.md index 6696b14..58dfafb 100644 --- a/content/CSE5519/CSE5519_F2.md +++ b/content/CSE5519/CSE5519_F2.md @@ -1,2 +1,21 @@ # CSE5519 Advances in Computer Vision (Topic F: 2022: Representation Learning) +## Masked Autoencoders Are Scalable Vision Learners + +[link to the paper](https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf) + +### Novelty in MAE + +#### Masked Autoencoders + +Masked Autoencoders are a type of autoencoders that mask out some of the input data and train the model to reconstruct the original data. For best performance, they mask out 75% of the input data. + +A masked auto encoder with a single-block decoder can perform strongly with fine tuning. + +This method speeds up the training process by a factor of 3-4 + +> [!TIP] +> +> This paper shows a new way to train a vision model by using masked autoencoders. The authors masked out 75% of the input data and train the model to reconstruct the original data by the insight that image data is highly redundant compared to text data when using transformer architecture. +> +> Currently, the sampling method is uniform and simple with surprising results. I wonder if we could use better sampling method, for example uniform sampling with information entropy on each patches would yield better results?