Files
NoteNextra-origin/content/CSE5519/CSE5519_E5.md
Trance-0 0597afb511 updates?
2025-11-14 11:15:12 -06:00

15 lines
713 B
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSE5519 Advances in Computer Vision (Topic E: 2025: Deep Learning for Geometric Computer Vision)
## VGGT: Visual Geometry Grounded Transformer
[link to paper](https://arxiv.org/pdf/2503.11651)
### Novelty in VGGT
Use alternating attention to encode the image.
> [!TIP]
>
>  VGGT uses a feed-forward neural network that directly infers all key 3D attributes of a scene using alternating attention and is robust to some non-rigid deformations.
>
> I wonder how this model adapts to different light settings for the same image, how the non-Lambertian reflectance is captured, and how this framework can be extended to recover the true color of the objects and evaluate the surface properties of the objects.