15 lines
806 B
Markdown
15 lines
806 B
Markdown
# CSE5519 Advances in Computer Vision (Topic I: 2023 - 2024: Embodied Computer Vision and Robotics)
|
|
|
|
## RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control.Links to an external site.
|
|
|
|
[link to the paper](https://arxiv.org/abs/2307.15818)
|
|
|
|
### Novelty in RT-2
|
|
|
|
VLA, vision-language-action models.
|
|
|
|
> [!TIP]
|
|
>
|
|
> This paper shows a new way to transfer web knowledge to robotic control. The key is to use a vision-language-action model to transfer the knowledge from the web to the robotic control.
|
|
>
|
|
> I'm considering how this framework could be migrated to two-hand robotic control. In general case, the action is solely done by one hand, but in most real-world applications, the action is done by two hands. I wonder if this framework could be extended to two-hand robotic control? |