Reinforcement Learning (RL), Fall 2024CSCI 4160/6963, ECSE 4965/6965
Personnel
- Instructor:
Rado Ivanov
(ivanor@rpi.edu), Lally 309
- TA: Thomas Waite (waitet)
- Mentor: Anthony Shaw (shawa9)
- Office hours:
- Rado: T 1-2pm, W 2-3pm, Th 11am-noon (Lally 309)
- Thomas: W 3-4pm (MRC 345)
- Anthony: by appointment
Class Time and Location
- Class: TF 10am-noon (Sage 3101)
Course Description
This is an introductory course on the theory and practice of reinforcement learning (RL). We will start with an introduction to linear algebra and probability and will quickly go over standard supervised learning techniques, such as linear regression. After that, we will derive the RL framework, starting from Markov chains and Markov reward processes and building up to Markov decision processes. We will then cover classic RL approaches such as dynamic programming, Monte Carlo methods, Q-learning and policy gradients. In the last part of the course, we will cover deep learning and deep RL.
Textbook
There is no required book for the course since there is no individual book that covers all the course material. All of the necessary material will be included in the lecture slides. I will suggest additional reading material along with each lecture. We will follow some of the material in the following books, (most of) which are available for free online:
- Hastie, Trevor, et al. The elements of statistical learning: data mining, inference, and prediction. Vol. 2. New York: springer, 2009. (available online)
- James, Gareth, et al. An introduction to statistical learning. Vol. 112. New York: springer, 2013. (available online)
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016. (available online)
- Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. (available online)
- Puterman, Martin L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
Grading
Students will be graded on 10 homework assignments. Each assignment is worth 10% of the final grade.
Late submission rule: You can use up to 2 extensions – you don’t need to explain why you submitted late. Each extension is for 3 days, midnight to midnight. If your submission is more than 3 days late (or more than 1 minute late in case you have used up your extensions), you will receive a score of 0. You cannot use both extensions on the same assignment.
Useful Resources
Announcements
It is the student's responsibility to be aware of and understand all announcements made in the lectures.
Assignments
Please check the tentative schedule for a tentative homework deadline schedule. Unless otherwise noted, each assignment will be due at midnight Thursday. I will post each assignment after the previous one is completed. Email announcements will be sent accordingly.
All assignments will be posted on and must be submitted through LMS. You are expected to work alone on all assignments, unless specifically noted otherwise -- please check the syllabus for a clarification of what constitutes academic dishonesty.
Lectures
- Aug. 30 -- Course Overview [], Machine Learning Intro []
- Sep. 6 -- Linear Algebra Intro [], Probability Intro []
- Sep. 10 -- Supervised Learning Overview [], Linear Regression []
- Sep. 13 -- Logistic Regression []
- Sep. 17 -- Decision Trees []
- Sep. 20 -- Reinforcement Learning and Control Intro []
- Sep. 24 -- Multi-Armed Bandits Intro []
- Sep. 27 -- Bayesian Bandits []
- Oct. 1 -- Finite State Automata and Markov Chains []
- Oct. 4 -- Markov Reward Processes []
- Oct. 8 -- Markov Decision Processes []
- Oct. 11 -- No class
- Oct. 15 -- Dynamic Programming []
- Oct. 18 -- Monte Carlo Methods []
- Oct. 22 -- Temporal Difference Learning []
- Oct. 25 -- Q-Learning []
- Oct. 29 -- Fully-connected Neural Networks []
- Nov. 1 -- Optimization []
- Nov. 5 -- Convolutional Neural Networks [], Image Classification []
- Nov. 8 -- Regularization []
- Nov. 12 -- Generalization []
- Nov. 15 -- Q-learning with Function Approximation []
- Nov. 19 -- Policy Gradient Theorem, REINFORCE []