NoteNextra-origin/content/CSE510/index.md

# CSE510 Deep Reinforcement Learning

CSE 5100

**Class meeting times and Locations:** Tue/Thur from 10-11:20 am (412A-01 ) in EADS Room 216

**Fall 2025**

## Instructor Information

**Chongjie Zhang**
Office: McKelvey Hall 2010D
Email: chongjie@wustl.edu

### Instructor's Office Hours:

Chongjie Zhang's Office Hours: Wednesdays 11:00 -12:00 am in Mckelvey Hall 2010D Or you may email me to make an appointment.

### TAs:

- Jianing Ye: jianing.y@wustl.edu
- Kefei Duan: d.kefei@wustl.edu
- Xiu Yuan: xiu@wustl.edu

**Office Hours:** Thursday 4:00pm -5:00pm in Mckelvey Hall 1030 (tentative) Or you may email TAs to make an appointment.

## Course Description

Deep Reinforcement Learning (RL) is a cutting-edge field at the intersection of artificial intelligence and decision-making. This course provides an in-depth exploration of the fundamental principles, algorithms, and applications of deep reinforcement learning. We start from the Markov Decision Process (MDP) framework and cover basic RL algorithms—value-based, policy-based, actor–critic, and model-based methods—then move to advanced topics including offline RL and multi-agent RL. By combining deep learning with reinforcement learning, students will gain the skills to build intelligent systems that learn from experience and make near-optimal decisions in complex environments.

The course caters to graduate and advanced undergraduate students. Student performance evaluation will revolve around written and programming assignments and the course project.

By the end of this course, students should be able to:

- Formalize sequential decision problems with MDPs and derive Bellman equations.
- Understand and analyze core RL algorithms (DP, MC, TD).
- Build, train, and debug deep value-based methods (e.g., DQN and key extensions).
- Implement and compare policy-gradient and actor–critic algorithms.
- Explain and apply exploration strategies and stabilization techniques in deep RL.
- Grasp model-based RL pipelines.
- Explain assumptions, risks, and evaluation pitfalls in offline RL; implement a baseline offline RL method.
- Formulate multi-agent RL problems; implement and evaluate a CTDE or value-decomposition method.
- Execute an end-to-end DRL project: problem selection, environment design, algorithm selection, experimental protocol, ablations, and reproducibility.

## Prerequisites

If you are unsure about any of these, please speak to the instructor.

- Proficiency in Python programming.
- Programming experience with deep learning.
- Research Experience (Not required, but highly recommended)
- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics.

One of the following:
- a) CSE 412A: Intro to A.I., or
- b) a Machine Learning course (CSE 417T or ESE 417).

## Textbook

**Primary text** (optional but recommended): Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). We will not cover all of the chapters and, from time to time, cover topics not contained in the book.

**Additional references:** Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.); OpenAI Spinning Up in Deep RL tutorial.

## Homeworks

There will be a total of three homework assignments distributed throughout the semester. Each assignment will be accessible on Canvas, allowing you approximately two weeks to finish and submit it before the designated deadline.

Late work will not be accepted. If you have a documented medical or emergency reason, contact the TAs as soon as possible.

**Collaboration:** Discussion of ideas is encouraged, but your write‑up and code must be your own. Acknowledge any collaborators and external resources.

**Academic Integrity:** Do not copy from peers or online sources. Violations will be referred per university policy.

## Final Project

A research‑level project of your choice that demonstrates mastery of DRL concepts and empirical methodology. Possible directions include: (a) improving an existing approach, (b) tackling an unsolved task/benchmark, (c) reproducing and extending a recent paper, or (d) creating a new task/problem relevant to RL.

**Team size:** 1–2 students by default (contact instructor/TAs for approval if proposing a larger team).

### Milestones:

- **Proposal:** ≤ 2 pages outlining problem, related work, methodology, evaluation plan, and risks.
- **Progress report with short survey:** ≤ 4 pages with preliminary results or diagnostics.
- **Presentation/Poster session:** brief talk or poster demo.
- **Final report:** 7–10 pages (NeurIPS format) with clear experiments, ablations, and reproducibility details.

## Evaluation

**Homework / Problem Sets (3) — 45%**
Each problem set combines written questions (derivations/short answers) and programming components (implementations and experiments).

**Final Course Project — 50% total**

- Proposal (max 2 pages) — 5% of project
- Progress report with brief survey (max 4 pages) — 10% of project
- Presentation/Poster session — 10% of project
- Final report (7–10 pages, NeurIPS style) — 25% of project

**Participation — 5%**
Contributions in class and on the course discussion forum, especially in the project presentation sessions.

**Course evaluations** (mid-semester and final course evaluations): extra credit up to 2%

## Grading Scale

The intended grading scale is as follows. The instructor reserves the right to adjust the grading scale.

- A's (A-,A,A+): >= 90%
- B's (B-,B,B+): >= 80%
- C's (C-,C,C+): >= 70%
- D's (D-,D,D+): >= 60%
- F: < 60%