This commit is contained in:
Trance-0
2025-08-26 11:20:57 -05:00
parent 7031792b80
commit 4c964e13f9
2 changed files with 228 additions and 1 deletions

View File

@@ -1,2 +1,131 @@
# CSE510 Deep Reinforcement Learning (Lecture 1)
## Artificial general intelligence
- Multimodeal perception
- Persistent memory + retrieval
- World modeling + planning
- Tool use with verification
- Interactive learning loops (RLHF/RLAIF)
- Uncertainty estimation & oversight
LLM may not be the ultimate solution for AGI, but may be a part of solution.
## Long-Horizon Agency
Decision-Making/Control and Multi-Agent collaboration
## Course logistics
Announcement and discussion on Canvas
Weekly recitations
Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030
or night office hours (11am-12pm Wed in Mckelvey Hall 2010D)
or by appointment
### Prerequisites
- Proficiency in Python programming.
- **Programming experience with deep learning**.
- Research Experience (Not required, but highly recommended)
- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.
### Textbook
Not required, but recommended:
- Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online).
- Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.).
- OpenAI Spinning Up in Deep RL tutorial.
### Final Project
Research-level project of your choice
- Improving an existing approach
- Tackling an unsolved task/benchmark
- Creating a new task/problem that hasn't been addressed by RL
Can be done in a team of 1-2 students
Must be harder than homework.
The core is to understand the pipeline of RL research, may not always be an improvement over existing methods.
#### Milestones
- Proposal (max 2 pages)
- Progress report with brief survey (max 4 pages)
- Presentation/Poster session
- Final report (7-10 pages, NeurIPS style)
## What is RL?
### Goal for course
How to build intelligent agents that **learn to act** and achieve specific goals in a **dynamic environments**?
Acting to achieve is key part of intelligence.
> Brain is to produce adaptable and complex movements. (Daniel Wolpert)
## What RL do
A general-purpose framwork for decision making/behavioral learning
- RL is for an agent with the capacity to act
- Each action influences the agent's future observation
- Success is measured by a scalar reward signal
- Goal: find a policy that maximize expected total rewards.
Exploration: Add randomness to your action selection
If the result was better than expected, do more of the same in the future.
### Deep reinforcement learning
DL is a general-purpose framework for representation learning.
- Given an objective
- Learn representation that is required to achieve objective
- Directly from raw inputs
- Using minimal domain knowledge
Deep learning enables RL algorithms to solve complex problems in an end-to-end manner.
### Machine learning Paradigm
Supervised learning: learning from examples
Self-supervised learning: learning structures in data
Reinforcement learning: learning from experiences
Example using LLMs:
Self-supervised: pretraining
SFT: supervised fine-tuning (post-training)
RL is also used in post-training for improving reasoning capabilities.
RLHF: reinforcement learning from human feedback (fine-tuning)
_RL generates data beyond the original training data._
All the paradigm are "supervised" by a loss function.
### Differences for RL from other paradigms
**Exploration**: the agent does not have prior data known to be good.
**Non-stationarity**: the environment is dynamic and the agent's actions influence the environment.
**Credit assignment**: the agent needs to learn to assign credit to its actions. (delayed reward)
**Limited samples**: actions take time to execute in the real world, which may limited the amount of experience.

View File

@@ -1 +1,99 @@
# CSE510 Deep Reinforcement Learning
# CSE510 Deep Reinforcement Learning
CSE 5100
**Class meeting times and Locations:** Tue/Thur from 10-11:20 am (412A-01 ) in EADS Room 216
**Fall 2025**
## Instructor Information
**Chongjie Zhang**
Office: McKelvey Hall 2010D
Email: chongjie@wustl.edu
### Instructor's Office Hours:
Chongjie Zhang's Office Hours: Wednesdays 11:00 -12:00 am in Mckelvey Hall 2010D Or you may email me to make an appointment.
### TAs:
- Jianing Ye: jianing.y@wustl.edu
- Kefei Duan: d.kefei@wustl.edu
- Xiu Yuan: xiu@wustl.edu
**Office Hours:** Thursday 4:00pm -5:00pm in Mckelvey Hall 1030 (tentative) Or you may email TAs to make an appointment.
## Course Description
Deep Reinforcement Learning (RL) is a cutting-edge field at the intersection of artificial intelligence and decision-making. This course provides an in-depth exploration of the fundamental principles, algorithms, and applications of deep reinforcement learning. We start from the Markov Decision Process (MDP) framework and cover basic RL algorithms—value-based, policy-based, actorcritic, and model-based methods—then move to advanced topics including offline RL and multi-agent RL. By combining deep learning with reinforcement learning, students will gain the skills to build intelligent systems that learn from experience and make near-optimal decisions in complex environments.
The course caters to graduate and advanced undergraduate students. Student performance evaluation will revolve around written and programming assignments and the course project.
By the end of this course, students should be able to:
- Formalize sequential decision problems with MDPs and derive Bellman equations.
- Understand and analyze core RL algorithms (DP, MC, TD).
- Build, train, and debug deep value-based methods (e.g., DQN and key extensions).
- Implement and compare policy-gradient and actorcritic algorithms.
- Explain and apply exploration strategies and stabilization techniques in deep RL.
- Grasp model-based RL pipelines.
- Explain assumptions, risks, and evaluation pitfalls in offline RL; implement a baseline offline RL method.
- Formulate multi-agent RL problems; implement and evaluate a CTDE or value-decomposition method.
- Execute an end-to-end DRL project: problem selection, environment design, algorithm selection, experimental protocol, ablations, and reproducibility.
## Prerequisites
If you are unsure about any of these, please speak to the instructor.
- Proficiency in Python programming.
- Programming experience with deep learning.
- Research Experience (Not required, but highly recommended)
- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.
One of the following:
- a) CSE 412A: Intro to A.I., or
- b) a Machine Learning course (CSE 417T or ESE 417).
## Textbook
**Primary text** (optional but recommended): Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). We will not cover all of the chapters and, from time to time, cover topics not contained in the book.
**Additional references:** Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.); OpenAI Spinning Up in Deep RL tutorial.
## Homeworks
There will be a total of three homework assignments distributed throughout the semester. Each assignment will be accessible on Canvas, allowing you approximately two weeks to finish and submit it before the designated deadline.
Late work will not be accepted. If you have a documented medical or emergency reason, contact the TAs as soon as possible.
**Collaboration:** Discussion of ideas is encouraged, but your writeup and code must be your own. Acknowledge any collaborators and external resources.
**Academic Integrity:** Do not copy from peers or online sources. Violations will be referred per university policy.
## Final Project
A researchlevel project of your choice that demonstrates mastery of DRL concepts and empirical methodology. Possible directions include: (a) improving an existing approach, (b) tackling an unsolved task/benchmark, (c) reproducing and extending a recent paper, or (d) creating a new task/problem relevant to RL.
**Team size:** 12 students by default (contact instructor/TAs for approval if proposing a larger team).
### Milestones:
- **Proposal:** ≤ 2 pages outlining problem, related work, methodology, evaluation plan, and risks.
- **Progress report with short survey:** ≤ 4 pages with preliminary results or diagnostics.
- **Presentation/Poster session:** brief talk or poster demo.
- **Final report:** 710 pages (NeurIPS format) with clear experiments, ablations, and reproducibility details.
## Evaluation
**Homework / Problem Sets (3) — 45%**
Each problem set combines written questions (derivations/short answers) and programming components (implementations and experiments).
**Final Course Project — 50% total**
- Proposal (max 2 pages) — 5% of project
- Progress report with brief survey (max 4 pages) — 10% of project
- Presentation/Poster session — 10% of project
- Final report (710 pages, NeurIPS style) — 25% of project
**Participation — 5%**
Contributions in class and on the course discussion forum, especially in the project presentation sessions.
**Course evaluations** (mid-semester and final course evaluations): extra credit up to 2%
## Grading Scale
The intended grading scale is as follows. The instructor reserves the right to adjust the grading scale.
- A's (A-,A,A+): >= 90%
- B's (B-,B,B+): >= 80%
- C's (C-,C,C+): >= 70%
- D's (D-,D,D+): >= 60%
- F: < 60%