diff --git a/content/CSE510/CSE510_L1.md b/content/CSE510/CSE510_L1.md index 6b49a0e..bf958e9 100644 --- a/content/CSE510/CSE510_L1.md +++ b/content/CSE510/CSE510_L1.md @@ -1,2 +1,131 @@ # CSE510 Deep Reinforcement Learning (Lecture 1) +## Artificial general intelligence + +- Multimodeal perception +- Persistent memory + retrieval +- World modeling + planning +- Tool use with verification +- Interactive learning loops (RLHF/RLAIF) +- Uncertainty estimation & oversight + +LLM may not be the ultimate solution for AGI, but may be a part of solution. + +## Long-Horizon Agency + +Decision-Making/Control and Multi-Agent collaboration + +## Course logistics + +Announcement and discussion on Canvas + +Weekly recitations + +Thursday 4:00PM- 5:00PM in Mckelvey Hall 1030 + +or night office hours (11am-12pm Wed in Mckelvey Hall 2010D) + +or by appointment + +### Prerequisites + +- Proficiency in Python programming. +- **Programming experience with deep learning**. +- Research Experience (Not required, but highly recommended) +- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics. + +### Textbook + +Not required, but recommended: + +- Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). +- Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.). +- OpenAI Spinning Up in Deep RL tutorial. + +### Final Project + +Research-level project of your choice + +- Improving an existing approach +- Tackling an unsolved task/benchmark +- Creating a new task/problem that hasn't been addressed by RL + +Can be done in a team of 1-2 students + +Must be harder than homework. + +The core is to understand the pipeline of RL research, may not always be an improvement over existing methods. + +#### Milestones + +- Proposal (max 2 pages) +- Progress report with brief survey (max 4 pages) +- Presentation/Poster session +- Final report (7-10 pages, NeurIPS style) + +## What is RL? + +### Goal for course + +How to build intelligent agents that **learn to act** and achieve specific goals in a **dynamic environments**? + +Acting to achieve is key part of intelligence. + +> Brain is to produce adaptable and complex movements. (Daniel Wolpert) + +## What RL do + +A general-purpose framwork for decision making/behavioral learning + +- RL is for an agent with the capacity to act +- Each action influences the agent's future observation +- Success is measured by a scalar reward signal +- Goal: find a policy that maximize expected total rewards. + +Exploration: Add randomness to your action selection + +If the result was better than expected, do more of the same in the future. + +### Deep reinforcement learning + +DL is a general-purpose framework for representation learning. + +- Given an objective +- Learn representation that is required to achieve objective +- Directly from raw inputs +- Using minimal domain knowledge + +Deep learning enables RL algorithms to solve complex problems in an end-to-end manner. + +### Machine learning Paradigm + +Supervised learning: learning from examples + +Self-supervised learning: learning structures in data + +Reinforcement learning: learning from experiences + +Example using LLMs: + +Self-supervised: pretraining + +SFT: supervised fine-tuning (post-training) + +RL is also used in post-training for improving reasoning capabilities. + +RLHF: reinforcement learning from human feedback (fine-tuning) + +_RL generates data beyond the original training data._ + +All the paradigm are "supervised" by a loss function. + +### Differences for RL from other paradigms + +**Exploration**: the agent does not have prior data known to be good. + +**Non-stationarity**: the environment is dynamic and the agent's actions influence the environment. + +**Credit assignment**: the agent needs to learn to assign credit to its actions. (delayed reward) + +**Limited samples**: actions take time to execute in the real world, which may limited the amount of experience. + diff --git a/content/CSE510/index.md b/content/CSE510/index.md index 122104f..4c349a0 100644 --- a/content/CSE510/index.md +++ b/content/CSE510/index.md @@ -1 +1,99 @@ -# CSE510 Deep Reinforcement Learning \ No newline at end of file +# CSE510 Deep Reinforcement Learning + +CSE 5100 + +**Class meeting times and Locations:** Tue/Thur from 10-11:20 am (412A-01 ) in EADS Room 216 + +**Fall 2025** + +## Instructor Information +**Chongjie Zhang** +Office: McKelvey Hall 2010D +Email: chongjie@wustl.edu + +### Instructor's Office Hours: +Chongjie Zhang's Office Hours: Wednesdays 11:00 -12:00 am in Mckelvey Hall 2010D Or you may email me to make an appointment. + +### TAs: +- Jianing Ye: jianing.y@wustl.edu +- Kefei Duan: d.kefei@wustl.edu +- Xiu Yuan: xiu@wustl.edu + +**Office Hours:** Thursday 4:00pm -5:00pm in Mckelvey Hall 1030 (tentative) Or you may email TAs to make an appointment. + +## Course Description +Deep Reinforcement Learning (RL) is a cutting-edge field at the intersection of artificial intelligence and decision-making. This course provides an in-depth exploration of the fundamental principles, algorithms, and applications of deep reinforcement learning. We start from the Markov Decision Process (MDP) framework and cover basic RL algorithms—value-based, policy-based, actor–critic, and model-based methods—then move to advanced topics including offline RL and multi-agent RL. By combining deep learning with reinforcement learning, students will gain the skills to build intelligent systems that learn from experience and make near-optimal decisions in complex environments. + +The course caters to graduate and advanced undergraduate students. Student performance evaluation will revolve around written and programming assignments and the course project. + +By the end of this course, students should be able to: + +- Formalize sequential decision problems with MDPs and derive Bellman equations. +- Understand and analyze core RL algorithms (DP, MC, TD). +- Build, train, and debug deep value-based methods (e.g., DQN and key extensions). +- Implement and compare policy-gradient and actor–critic algorithms. +- Explain and apply exploration strategies and stabilization techniques in deep RL. +- Grasp model-based RL pipelines. +- Explain assumptions, risks, and evaluation pitfalls in offline RL; implement a baseline offline RL method. +- Formulate multi-agent RL problems; implement and evaluate a CTDE or value-decomposition method. +- Execute an end-to-end DRL project: problem selection, environment design, algorithm selection, experimental protocol, ablations, and reproducibility. + +## Prerequisites +If you are unsure about any of these, please speak to the instructor. + +- Proficiency in Python programming. +- Programming experience with deep learning. +- Research Experience (Not required, but highly recommended) +- Mathematics: Linear Algebra (MA 429 or MA 439 or ESE 318), Calculus III (MA 233), Probability & Statistics. + +One of the following: +- a) CSE 412A: Intro to A.I., or +- b) a Machine Learning course (CSE 417T or ESE 417). + +## Textbook +**Primary text** (optional but recommended): Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed., online). We will not cover all of the chapters and, from time to time, cover topics not contained in the book. + +**Additional references:** Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.); OpenAI Spinning Up in Deep RL tutorial. + +## Homeworks +There will be a total of three homework assignments distributed throughout the semester. Each assignment will be accessible on Canvas, allowing you approximately two weeks to finish and submit it before the designated deadline. + +Late work will not be accepted. If you have a documented medical or emergency reason, contact the TAs as soon as possible. + +**Collaboration:** Discussion of ideas is encouraged, but your write‑up and code must be your own. Acknowledge any collaborators and external resources. + +**Academic Integrity:** Do not copy from peers or online sources. Violations will be referred per university policy. + +## Final Project +A research‑level project of your choice that demonstrates mastery of DRL concepts and empirical methodology. Possible directions include: (a) improving an existing approach, (b) tackling an unsolved task/benchmark, (c) reproducing and extending a recent paper, or (d) creating a new task/problem relevant to RL. + +**Team size:** 1–2 students by default (contact instructor/TAs for approval if proposing a larger team). + +### Milestones: +- **Proposal:** ≤ 2 pages outlining problem, related work, methodology, evaluation plan, and risks. +- **Progress report with short survey:** ≤ 4 pages with preliminary results or diagnostics. +- **Presentation/Poster session:** brief talk or poster demo. +- **Final report:** 7–10 pages (NeurIPS format) with clear experiments, ablations, and reproducibility details. + +## Evaluation +**Homework / Problem Sets (3) — 45%** +Each problem set combines written questions (derivations/short answers) and programming components (implementations and experiments). + +**Final Course Project — 50% total** +- Proposal (max 2 pages) — 5% of project +- Progress report with brief survey (max 4 pages) — 10% of project +- Presentation/Poster session — 10% of project +- Final report (7–10 pages, NeurIPS style) — 25% of project + +**Participation — 5%** +Contributions in class and on the course discussion forum, especially in the project presentation sessions. + +**Course evaluations** (mid-semester and final course evaluations): extra credit up to 2% + +## Grading Scale +The intended grading scale is as follows. The instructor reserves the right to adjust the grading scale. +- A's (A-,A,A+): >= 90% +- B's (B-,B,B+): >= 80% +- C's (C-,C,C+): >= 70% +- D's (D-,D,D+): >= 60% +- F: < 60% \ No newline at end of file