Files
NoteNextra-origin/content/CSE5519/CSE5519_I3.md
2025-10-22 11:10:53 -05:00

806 B

CSE5519 Advances in Computer Vision (Topic I: 2023 - 2024: Embodied Computer Vision and Robotics)

link to the paper

Novelty in RT-2

VLA, vision-language-action models.

Tip

This paper shows a new way to transfer web knowledge to robotic control. The key is to use a vision-language-action model to transfer the knowledge from the web to the robotic control.

I'm considering how this framework could be migrated to two-hand robotic control. In general case, the action is solely done by one hand, but in most real-world applications, the action is done by two hands. I wonder if this framework could be extended to two-hand robotic control?