update
This commit is contained in:
@@ -1,2 +1,131 @@
|
|||||||
# CSE510 Deep Reinforcement Learning (Lecture 1)
|
# CSE510 Deep Reinforcement Learning (Lecture 1)
|
||||||
|
|
||||||
|
## Artificial general intelligence
|
||||||
|
|
||||||
|
- Multimodeal perception
|
||||||
|
- Persistent memory + retrieval
|
||||||
|
- World modeling + planning
|
||||||
|
- Tool use with verification
|
||||||
|
- Interactive learning loops (RLHF/RLAIF)
|
||||||
|
- Uncertainty estimation & oversight
|
||||||
|
|
||||||
|
LLM may not be the ultimate solution for AGI, but may be a part of solution.
|
||||||
|
|
||||||
|
## Long-Horizon Agency
|
||||||
|
|
||||||
|
Decision-Making/Control and Multi-Agent collaboration
|
||||||
|
|
||||||
|
## Course logistics
|
||||||
|
|
||||||
|
Announcement and discussion on Canvas
|
||||||
|
|
||||||
|
Weekly recitations
|
||||||
|
|
||||||
|
Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030
|
||||||
|
|
||||||
|
or night office hours (11am-12pm Wed in Mckelvey Hall 2010D)
|
||||||
|
|
||||||
|
or by appointment
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Proficiency in Python programming.
|
||||||
|
- **Programming experience with deep learning**.
|
||||||
|
- Research Experience (Not required, but highly recommended)
|
||||||
|
- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.
|
||||||
|
|
||||||
|
### Textbook
|
||||||
|
|
||||||
|
Not required, but recommended:
|
||||||
|
|
||||||
|
- Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online).
|
||||||
|
- Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.).
|
||||||
|
- OpenAI Spinning Up in Deep RL tutorial.
|
||||||
|
|
||||||
|
### Final Project
|
||||||
|
|
||||||
|
Research-level project of your choice
|
||||||
|
|
||||||
|
- Improving an existing approach
|
||||||
|
- Tackling an unsolved task/benchmark
|
||||||
|
- Creating a new task/problem that hasn't been addressed by RL
|
||||||
|
|
||||||
|
Can be done in a team of 1-2 students
|
||||||
|
|
||||||
|
Must be harder than homework.
|
||||||
|
|
||||||
|
The core is to understand the pipeline of RL research, may not always be an improvement over existing methods.
|
||||||
|
|
||||||
|
#### Milestones
|
||||||
|
|
||||||
|
- Proposal (max 2 pages)
|
||||||
|
- Progress report with brief survey (max 4 pages)
|
||||||
|
- Presentation/Poster session
|
||||||
|
- Final report (7-10 pages, NeurIPS style)
|
||||||
|
|
||||||
|
## What is RL?
|
||||||
|
|
||||||
|
### Goal for course
|
||||||
|
|
||||||
|
How to build intelligent agents that **learn to act** and achieve specific goals in a **dynamic environments**?
|
||||||
|
|
||||||
|
Acting to achieve is key part of intelligence.
|
||||||
|
|
||||||
|
> Brain is to produce adaptable and complex movements. (Daniel Wolpert)
|
||||||
|
|
||||||
|
## What RL do
|
||||||
|
|
||||||
|
A general-purpose framwork for decision making/behavioral learning
|
||||||
|
|
||||||
|
- RL is for an agent with the capacity to act
|
||||||
|
- Each action influences the agent's future observation
|
||||||
|
- Success is measured by a scalar reward signal
|
||||||
|
- Goal: find a policy that maximize expected total rewards.
|
||||||
|
|
||||||
|
Exploration: Add randomness to your action selection
|
||||||
|
|
||||||
|
If the result was better than expected, do more of the same in the future.
|
||||||
|
|
||||||
|
### Deep reinforcement learning
|
||||||
|
|
||||||
|
DL is a general-purpose framework for representation learning.
|
||||||
|
|
||||||
|
- Given an objective
|
||||||
|
- Learn representation that is required to achieve objective
|
||||||
|
- Directly from raw inputs
|
||||||
|
- Using minimal domain knowledge
|
||||||
|
|
||||||
|
Deep learning enables RL algorithms to solve complex problems in an end-to-end manner.
|
||||||
|
|
||||||
|
### Machine learning Paradigm
|
||||||
|
|
||||||
|
Supervised learning: learning from examples
|
||||||
|
|
||||||
|
Self-supervised learning: learning structures in data
|
||||||
|
|
||||||
|
Reinforcement learning: learning from experiences
|
||||||
|
|
||||||
|
Example using LLMs:
|
||||||
|
|
||||||
|
Self-supervised: pretraining
|
||||||
|
|
||||||
|
SFT: supervised fine-tuning (post-training)
|
||||||
|
|
||||||
|
RL is also used in post-training for improving reasoning capabilities.
|
||||||
|
|
||||||
|
RLHF: reinforcement learning from human feedback (fine-tuning)
|
||||||
|
|
||||||
|
_RL generates data beyond the original training data._
|
||||||
|
|
||||||
|
All the paradigm are "supervised" by a loss function.
|
||||||
|
|
||||||
|
### Differences for RL from other paradigms
|
||||||
|
|
||||||
|
**Exploration**: the agent does not have prior data known to be good.
|
||||||
|
|
||||||
|
**Non-stationarity**: the environment is dynamic and the agent's actions influence the environment.
|
||||||
|
|
||||||
|
**Credit assignment**: the agent needs to learn to assign credit to its actions. (delayed reward)
|
||||||
|
|
||||||
|
**Limited samples**: actions take time to execute in the real world, which may limited the amount of experience.
|
||||||
|
|
||||||
|
|||||||
@@ -1 +1,99 @@
|
|||||||
# CSE510 Deep Reinforcement Learning
|
# CSE510 Deep Reinforcement Learning
|
||||||
|
|
||||||
|
CSE 5100
|
||||||
|
|
||||||
|
**Class meeting times and Locations:** Tue/Thur from 10-11:20 am (412A-01 ) in EADS Room 216
|
||||||
|
|
||||||
|
**Fall 2025**
|
||||||
|
|
||||||
|
## Instructor Information
|
||||||
|
**Chongjie Zhang**
|
||||||
|
Office: McKelvey Hall 2010D
|
||||||
|
Email: chongjie@wustl.edu
|
||||||
|
|
||||||
|
### Instructor's Office Hours:
|
||||||
|
Chongjie Zhang's Office Hours: Wednesdays 11:00 -12:00 am in Mckelvey Hall 2010D Or you may email me to make an appointment.
|
||||||
|
|
||||||
|
### TAs:
|
||||||
|
- Jianing Ye: jianing.y@wustl.edu
|
||||||
|
- Kefei Duan: d.kefei@wustl.edu
|
||||||
|
- Xiu Yuan: xiu@wustl.edu
|
||||||
|
|
||||||
|
**Office Hours:** Thursday 4:00pm -5:00pm in Mckelvey Hall 1030 (tentative) Or you may email TAs to make an appointment.
|
||||||
|
|
||||||
|
## Course Description
|
||||||
|
Deep Reinforcement Learning (RL) is a cutting-edge field at the intersection of artificial intelligence and decision-making. This course provides an in-depth exploration of the fundamental principles, algorithms, and applications of deep reinforcement learning. We start from the Markov Decision Process (MDP) framework and cover basic RL algorithms—value-based, policy-based, actor–critic, and model-based methods—then move to advanced topics including offline RL and multi-agent RL. By combining deep learning with reinforcement learning, students will gain the skills to build intelligent systems that learn from experience and make near-optimal decisions in complex environments.
|
||||||
|
|
||||||
|
The course caters to graduate and advanced undergraduate students. Student performance evaluation will revolve around written and programming assignments and the course project.
|
||||||
|
|
||||||
|
By the end of this course, students should be able to:
|
||||||
|
|
||||||
|
- Formalize sequential decision problems with MDPs and derive Bellman equations.
|
||||||
|
- Understand and analyze core RL algorithms (DP, MC, TD).
|
||||||
|
- Build, train, and debug deep value-based methods (e.g., DQN and key extensions).
|
||||||
|
- Implement and compare policy-gradient and actor–critic algorithms.
|
||||||
|
- Explain and apply exploration strategies and stabilization techniques in deep RL.
|
||||||
|
- Grasp model-based RL pipelines.
|
||||||
|
- Explain assumptions, risks, and evaluation pitfalls in offline RL; implement a baseline offline RL method.
|
||||||
|
- Formulate multi-agent RL problems; implement and evaluate a CTDE or value-decomposition method.
|
||||||
|
- Execute an end-to-end DRL project: problem selection, environment design, algorithm selection, experimental protocol, ablations, and reproducibility.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
If you are unsure about any of these, please speak to the instructor.
|
||||||
|
|
||||||
|
- Proficiency in Python programming.
|
||||||
|
- Programming experience with deep learning.
|
||||||
|
- Research Experience (Not required, but highly recommended)
|
||||||
|
- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.
|
||||||
|
|
||||||
|
One of the following:
|
||||||
|
- a) CSE 412A: Intro to A.I., or
|
||||||
|
- b) a Machine Learning course (CSE 417T or ESE 417).
|
||||||
|
|
||||||
|
## Textbook
|
||||||
|
**Primary text** (optional but recommended): Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). We will not cover all of the chapters and, from time to time, cover topics not contained in the book.
|
||||||
|
|
||||||
|
**Additional references:** Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.); OpenAI Spinning Up in Deep RL tutorial.
|
||||||
|
|
||||||
|
## Homeworks
|
||||||
|
There will be a total of three homework assignments distributed throughout the semester. Each assignment will be accessible on Canvas, allowing you approximately two weeks to finish and submit it before the designated deadline.
|
||||||
|
|
||||||
|
Late work will not be accepted. If you have a documented medical or emergency reason, contact the TAs as soon as possible.
|
||||||
|
|
||||||
|
**Collaboration:** Discussion of ideas is encouraged, but your write‑up and code must be your own. Acknowledge any collaborators and external resources.
|
||||||
|
|
||||||
|
**Academic Integrity:** Do not copy from peers or online sources. Violations will be referred per university policy.
|
||||||
|
|
||||||
|
## Final Project
|
||||||
|
A research‑level project of your choice that demonstrates mastery of DRL concepts and empirical methodology. Possible directions include: (a) improving an existing approach, (b) tackling an unsolved task/benchmark, (c) reproducing and extending a recent paper, or (d) creating a new task/problem relevant to RL.
|
||||||
|
|
||||||
|
**Team size:** 1–2 students by default (contact instructor/TAs for approval if proposing a larger team).
|
||||||
|
|
||||||
|
### Milestones:
|
||||||
|
- **Proposal:** ≤ 2 pages outlining problem, related work, methodology, evaluation plan, and risks.
|
||||||
|
- **Progress report with short survey:** ≤ 4 pages with preliminary results or diagnostics.
|
||||||
|
- **Presentation/Poster session:** brief talk or poster demo.
|
||||||
|
- **Final report:** 7–10 pages (NeurIPS format) with clear experiments, ablations, and reproducibility details.
|
||||||
|
|
||||||
|
## Evaluation
|
||||||
|
**Homework / Problem Sets (3) — 45%**
|
||||||
|
Each problem set combines written questions (derivations/short answers) and programming components (implementations and experiments).
|
||||||
|
|
||||||
|
**Final Course Project — 50% total**
|
||||||
|
- Proposal (max 2 pages) — 5% of project
|
||||||
|
- Progress report with brief survey (max 4 pages) — 10% of project
|
||||||
|
- Presentation/Poster session — 10% of project
|
||||||
|
- Final report (7–10 pages, NeurIPS style) — 25% of project
|
||||||
|
|
||||||
|
**Participation — 5%**
|
||||||
|
Contributions in class and on the course discussion forum, especially in the project presentation sessions.
|
||||||
|
|
||||||
|
**Course evaluations** (mid-semester and final course evaluations): extra credit up to 2%
|
||||||
|
|
||||||
|
## Grading Scale
|
||||||
|
The intended grading scale is as follows. The instructor reserves the right to adjust the grading scale.
|
||||||
|
- A's (A-,A,A+): >= 90%
|
||||||
|
- B's (B-,B,B+): >= 80%
|
||||||
|
- C's (C-,C,C+): >= 70%
|
||||||
|
- D's (D-,D,D+): >= 60%
|
||||||
|
- F: < 60%
|
||||||
Reference in New Issue
Block a user