# CSE510 Deep Reinforcement Learning (Lecture 1) ## Artificial general intelligence - Multimodeal perception - Persistent memory + retrieval - World modeling + planning - Tool use with verification - Interactive learning loops (RLHF/RLAIF) - Uncertainty estimation & oversight LLM may not be the ultimate solution for AGI, but may be a part of solution. ## Long-Horizon Agency Decision-Making/Control and Multi-Agent collaboration ## Course logistics Announcement and discussion on Canvas Weekly recitations Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030 or night office hours (11am-12pm Wed in Mckelvey Hall 2010D) or by appointment ### Prerequisites - Proficiency in Python programming. - **Programming experience with deep learning**. - Research Experience (Not required, but highly recommended) - Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics. ### Textbook Not required, but recommended: - Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). - Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.). - OpenAI Spinning Up in Deep RL tutorial. ### Final Project Research-level project of your choice - Improving an existing approach - Tackling an unsolved task/benchmark - Creating a new task/problem that hasn't been addressed by RL Can be done in a team of 1-2 students Must be harder than homework. The core is to understand the pipeline of RL research, may not always be an improvement over existing methods. #### Milestones - Proposal (max 2 pages) - Progress report with brief survey (max 4 pages) - Presentation/Poster session - Final report (7-10 pages, NeurIPS style) ## What is RL? ### Goal for course How to build intelligent agents that **learn to act** and achieve specific goals in a **dynamic environments**? Acting to achieve is key part of intelligence. > Brain is to produce adaptable and complex movements. (Daniel Wolpert) ## What RL do A general-purpose framwork for decision making/behavioral learning - RL is for an agent with the capacity to act - Each action influences the agent's future observation - Success is measured by a scalar reward signal - Goal: find a policy that maximize expected total rewards. Exploration: Add randomness to your action selection If the result was better than expected, do more of the same in the future. ### Deep reinforcement learning DL is a general-purpose framework for representation learning. - Given an objective - Learn representation that is required to achieve objective - Directly from raw inputs - Using minimal domain knowledge Deep learning enables RL algorithms to solve complex problems in an end-to-end manner. ### Machine learning Paradigm Supervised learning: learning from examples Self-supervised learning: learning structures in data Reinforcement learning: learning from experiences Example using LLMs: Self-supervised: pretraining SFT: supervised fine-tuning (post-training) RL is also used in post-training for improving reasoning capabilities. RLHF: reinforcement learning from human feedback (fine-tuning) _RL generates data beyond the original training data._ All the paradigm are "supervised" by a loss function. ### Differences for RL from other paradigms **Exploration**: the agent does not have prior data known to be good. **Non-stationarity**: the environment is dynamic and the agent's actions influence the environment. **Credit assignment**: the agent needs to learn to assign credit to its actions. (delayed reward) **Limited samples**: actions take time to execute in the real world, which may limited the amount of experience.