NoteNextra-origin/content/CSE5519/CSE5519_G5.md

# CSE5519 Advances in Computer Vision (Topic G: 2025: Correspondence Estimation and Structure from Motion)

## MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos

[link to paper](https://arxiv.org/pdf/2412.04463)

- vanilla Droid-SLAM
- mono-depth initialization
- objective movement map prediction
- two-stage training scheme

> [!TIP]
>
> How does the two-stage training scheme help with the robustness of the model? For me, it seems that this paper is just the integration of GeoNet (separated pose and depth) with full regression.