partial updates

This commit is contained in:
Trance-0
2025-11-18 13:25:21 -06:00
parent 946d0b605f
commit 9416bd4956
10 changed files with 1218 additions and 136 deletions

View File

@@ -1,2 +1,14 @@
# CSE5519 Advances in Computer Vision (Topic G: 2025: Correspondence Estimation and Structure from Motion)
## MegaSaM: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos
[link to paper](https://arxiv.org/pdf/2412.04463)
- vanilla Droid-SLAM
- mono-depth initialization
- objective movement map prediction
- two-stage training scheme
> [!TIP]
>
> How does the two-stage training scheme help with the robustness of the model? For me, it seems that this paper is just the integration of GeoNet (separated pose and depth) with full regression.

View File

@@ -1,2 +1,16 @@
# CSE5519 Advances in Computer Vision (Topic I: 2025: Embodied Computer Vision and Robotics)
## Navigation World Models
[link to paper](https://arxiv.org/pdf/2412.03572)
### Novelty in NWM
- Conditional Diffusion Transformer
- Use time and action to conditioning the diffusion process
> [!TIP]
>
> This paper provides a new way to train navigation world models. Via conditioned diffusion, the model can generate an imagined trajectory in an unknown environment and perform navigation tasks.
>
> However, the model collapses frequently when using out-of-distribution data, resulting in poor navigation performance. I wonder how we can further condition on the novelty of the environment and integrate exploration strategies to train the model online to fix the collapse issue. What might be the challenges of doing so in the Conditioned Diffusion Transformer?