update

2025-08-26 11:20:57 -05:00
parent 7031792b80
commit 4c964e13f9
2 changed files with 228 additions and 1 deletions
--- a/content/CSE510/CSE510_L1.md
+++ b/content/CSE510/CSE510_L1.md
@@ -1,2 +1,131 @@
 # CSE510 Deep Reinforcement Learning (Lecture 1)

+## Artificial general intelligence
+
+- Multimodeal perception
+- Persistent memory + retrieval
+- World modeling + planning
+- Tool use with verification
+- Interactive learning loops (RLHF/RLAIF)
+- Uncertainty estimation & oversight
+
+LLM may not be the ultimate solution for AGI, but may be a part of solution.
+
+## Long-Horizon Agency
+
+Decision-Making/Control and Multi-Agent collaboration
+
+## Course logistics
+
+Announcement and discussion on Canvas
+
+Weekly recitations
+
+Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030
+
+or night office hours (11am-12pm Wed in Mckelvey Hall 2010D)
+
+or by appointment
+
+### Prerequisites
+
+- Proficiency in Python programming.
+- **Programming experience with deep learning**.
+- Research Experience (Not required, but highly recommended)
+- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.
+
+### Textbook
+
+Not required, but recommended:
+
+- Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online).
+- Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.).
+- OpenAI Spinning Up in Deep RL tutorial.
+
+### Final Project
+
+Research-level project of your choice
+
+- Improving an existing approach
+- Tackling an unsolved task/benchmark
+- Creating a new task/problem that hasn't been addressed by RL
+
+Can be done in a team of 1-2 students
+
+Must be harder than homework.
+
+The core is to understand the pipeline of RL research, may not always be an improvement over existing methods.
+
+#### Milestones
+
+- Proposal (max 2 pages)
+- Progress report with brief survey (max 4 pages)
+- Presentation/Poster session
+- Final report (7-10 pages, NeurIPS style)
+
+## What is RL?
+
+### Goal for course
+
+How to build intelligent agents that **learn to act** and achieve specific goals in a **dynamic environments**?
+
+Acting to achieve is key part of intelligence.
+
+> Brain is to produce adaptable and complex movements. (Daniel Wolpert)
+
+## What RL do
+
+A general-purpose framwork for decision making/behavioral learning
+
+- RL is for an agent with the capacity to act
+- Each action influences the agent's future observation
+- Success is measured by a scalar reward signal
+- Goal: find a policy that maximize expected total rewards.
+
+Exploration: Add randomness to your action selection
+
+If the result was better than expected, do more of the same in the future.
+
+### Deep reinforcement learning
+
+DL is a general-purpose framework for representation learning.
+
+- Given an objective
+- Learn representation that is required to achieve objective
+- Directly from raw inputs
+- Using minimal domain knowledge
+
+Deep learning enables RL algorithms to solve complex problems in an end-to-end manner.
+
+### Machine learning Paradigm
+
+Supervised learning: learning from examples
+
+Self-supervised learning: learning structures in data
+
+Reinforcement learning: learning from experiences
+
+Example using LLMs:
+
+Self-supervised: pretraining
+
+SFT: supervised fine-tuning (post-training)
+
+RL is also used in post-training for improving reasoning capabilities.
+
+RLHF: reinforcement learning from human feedback (fine-tuning)
+
+_RL generates data beyond the original training data._
+
+All the paradigm are "supervised" by a loss function.
+
+### Differences for RL from other paradigms
+
+**Exploration**: the agent does not have prior data known to be good.
+
+**Non-stationarity**: the environment is dynamic and the agent's actions influence the environment.
+
+**Credit assignment**: the agent needs to learn to assign credit to its actions. (delayed reward)
+
+**Limited samples**: actions take time to execute in the real world, which may limited the amount of experience.
+