Files
NoteNextra-origin/content/CSE5519/CSE5519_I2.md
Zheyuan Wu 5ce0c8773b updates
2025-10-01 23:24:12 -05:00

29 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSE5519 Advances in Computer Vision (Topic I: 2022: Embodied Computer Vision and Robotics)
## DayDreamer: World Models for Physical Robot Learning
[link to paper](https://arxiv.org/pdf/2206.14176)
This is a real world learning framework for robotics.
### Novelty in the integration of worldmodel learning with reinforcement learning
Leverage the dreamer algorithm for fast robot learning in real world.
Two neural network components drawn from the replay buffer.
#### Encoder
Fuses all sensory modalities into discrete codes. The decoder reconstruction the inputs from the codes, providing a rich learning signal and enabling human inspection of model predictions.
A **recurrent state-space model** is trained to predict the future code given actions, without observing the intermediate inputs.
#### World model learning
The world model enables massively parallel policy optimization from imagined rollouts in the compact latent space using a large batch size, without having to reconstruct sensory inputs. Dreamer trains a _policy network_ and _value network_ from the imagined rollouts and learned.
> [!TIP]
>
> This paper uses online reinforcement learning to reach unsupervised training in a real environment with replay buffers.
>
> The key limitation in the process is that it requires long real training time, as the simulator can concurrently generate large batches of data for training. Is it more efficient to use the simulator to train some parts of the model first and use the real-world data to fine-tune the model, or would it be more efficient in terms of training time and repair costs? In the paper, there are a few comparisons of results between simulator training and the real-world model. I wonder what the story is on the other side? How does the pure simulator-based model training go?