updates

2025-11-03 01:30:59 -06:00
parent f13b49aa92
commit a9d84cb2bb
3 changed files with 20 additions and 3 deletions
--- a/content/CSE5519/CSE5519_E4.md
+++ b/content/CSE5519/CSE5519_E4.md
@@ -1,2 +1,19 @@
 # CSE5519 Advances in Computer Vision (Topic E: 2024: Deep Learning for Geometric Computer Vision)

+## DUSt3R: Geometric 3D Vision Made Easy.Links to an external site.
+
+[link to paper](https://arxiv.org/pdf/2312.14132)
+
+### Novelty in DUSt3R
+
+Use point map to represent the 3D scene, combining with the camera intrinsics to estimate the 3D scene.
+
+Direct-RGB to 3D scene.
+
+Use ViT to encode the image, and then use two Transformer decoder (with information sharing between them) to decode the two representation of the same scene $F_1$ and $F_2$. Direct regression from RGB to point map and confidence map.
+
+>[!TIP]
+>
+> Compared with previous works, this paper directly regresses the point map and confidence map from RGB, producing a more accurate and efficient 3D scene representation.
+> 
+> However, I'm not sure how the information across the two representations is shared in the Transformer decoder. If for a multiview image, there are two pairs of images that don't have any overlapping region, how can the model correctly reconstruct the 3D scene?