Course 1 : Reinforcement Learning-Tabular Solution Methods
Course Curriculum
Multi armed bandits
Multi-armed bandits are a class of reinforcement learning problems where an agent must decide which action (or "arm") to choose from a set of options, each associated with an unknown reward.
-
A k-armed Bandit Problem
00:00 -
Action-value Methods
00:00 -
The 10-armed Testbed
00:00 -
Incremental Implementation
00:00 -
Tracking a Nonstationary Problem
00:00 -
Optimistic Initial Values
00:00 -
Upper-Confidence-Bound Action Selection
00:00 -
Thompson sampling
00:00 -
Softmax Algorithm
00:00 -
Gradient Bandit Algorithms
00:00 -
Associative Search (Contextual Bandits)
00:00
Familiarity with Markov property
-
Basics of Markov property
00:00 -
Example – Next word prediction
00:00 -
Understand Hidden Markov models
00:00 -
Example – Python code on HMM
00:00 -
Use library : HMMLEARN
00:00 -
Example : Parts of speech prediction
00:00
Markov Decision Process (MDP)
-
Agent – Env interface
00:00 -
States and rewards
00:00 -
Example – Markov decision process
00:00 -
Expected Return
00:00 -
Example : Expected Return
00:00 -
Episodic and continuous task
00:00 -
Policies
00:00 -
Values function and state-action value functions
00:00 -
Example 1 : Value functions
00:00 -
Example 2 : Value functions
00:00 -
Example 3 : Value functions
00:00 -
Example 4 : Value functions
00:00 -
Bellman Expectation Equations
00:00 -
Example 1 :Bellman Expectation Equations
00:00 -
Example 2: Bellman Expectation Equations
00:00 -
Bellman Optimality equations
00:00 -
Example 1 : Bellman Optimality equations
00:00 -
Example 2 : Bellman Optimality equations
00:00
Dynamic Programming
-
Policy Evaluation method – overview
00:00 -
Example 1 – PE
00:00 -
Example 2 – PE
00:00 -
PE – strengths and weaknesses
00:00 -
Value Iteration – Overview
00:00 -
Demo : VI
00:00 -
Key points on VI
00:00 -
Policy Improvement and Policy iteration
00:00 -
Summarize DP methods
00:00
Monte Carlo Methods
-
MC prediction
00:00 -
MC Estimation of Action Values
00:00 -
Monte Carlo Control
00:00 -
MC Exploring Starts
00:00 -
Game of black jack – overview
00:00
TD learning
-
TD prediction
00:00 -
Advantages of TD prediction methods
00:00 -
Optimality of TD(0)
00:00 -
SARSA – on policy TD control
00:00 -
Q-learning : Off policy TD control
00:00 -
Expected SARSA
00:00 -
Maximization Bias and Double Q learning
00:00
Student Ratings & Reviews
No Review Yet
₹40,000
-
LevelExpert
-
Duration24 hours
-
Last UpdatedApril 15, 2024
Hi, Welcome back!
Material Includes
- self-learning videos
- PDFs
- Python notebooks
- Assignments
- Quiz