Course 1 : Reinforcement Learning-Tabular Solution Methods

Categories Reinforcement Learning

Wishlist

Course Curriculum

Multi armed bandits
Multi-armed bandits are a class of reinforcement learning problems where an agent must decide which action (or "arm") to choose from a set of options, each associated with an unknown reward.

A k-armed Bandit Problem

00:00
Action-value Methods

00:00
The 10-armed Testbed

00:00
Incremental Implementation

00:00
Tracking a Nonstationary Problem

00:00
Optimistic Initial Values

00:00
Upper-Confidence-Bound Action Selection

00:00
Thompson sampling

00:00
Softmax Algorithm

00:00
Gradient Bandit Algorithms

00:00
Associative Search (Contextual Bandits)

00:00

Familiarity with Markov property

Markov Decision Process (MDP)

Agent – Env interface

00:00
States and rewards

00:00
Example – Markov decision process

00:00
Expected Return

00:00
Example : Expected Return

00:00
Episodic and continuous task

00:00
Policies

00:00
Values function and state-action value functions

00:00
Example 1 : Value functions

00:00
Example 2 : Value functions

00:00
Example 3 : Value functions

00:00
Example 4 : Value functions

00:00
Bellman Expectation Equations

00:00
Example 1 :Bellman Expectation Equations

00:00
Example 2: Bellman Expectation Equations

00:00
Bellman Optimality equations

00:00
Example 1 : Bellman Optimality equations

00:00
Example 2 : Bellman Optimality equations

00:00

Dynamic Programming

Monte Carlo Methods

TD learning

Student Ratings & Reviews

No Review Yet

Course 1 : Reinforcement Learning-Tabular Solution Methods

Course Curriculum

Multi armed bandits Multi-armed bandits are a class of reinforcement learning problems where an agent must decide which action (or "arm") to choose from a set of options, each associated with an unknown reward.

A k-armed Bandit Problem

Action-value Methods

The 10-armed Testbed

Incremental Implementation

Tracking a Nonstationary Problem

Optimistic Initial Values

Upper-Confidence-Bound Action Selection

Thompson sampling

Softmax Algorithm

Gradient Bandit Algorithms

Associative Search (Contextual Bandits)

Familiarity with Markov property

Basics of Markov property

Example – Next word prediction

Understand Hidden Markov models

Example – Python code on HMM

Use library : HMMLEARN

Example : Parts of speech prediction

Markov Decision Process (MDP)

Agent – Env interface

States and rewards

Example – Markov decision process

Expected Return

Example : Expected Return

Episodic and continuous task

Policies

Values function and state-action value functions

Example 1 : Value functions

Example 2 : Value functions

Example 3 : Value functions

Example 4 : Value functions

Bellman Expectation Equations

Example 1 :Bellman Expectation Equations

Example 2: Bellman Expectation Equations

Bellman Optimality equations

Example 1 : Bellman Optimality equations

Example 2 : Bellman Optimality equations

Dynamic Programming

Policy Evaluation method – overview

Example 1 – PE

Example 2 – PE

PE – strengths and weaknesses

Value Iteration – Overview

Demo : VI

Key points on VI

Policy Improvement and Policy iteration

Summarize DP methods

Monte Carlo Methods

MC prediction

MC Estimation of Action Values

Monte Carlo Control

MC Exploring Starts

Game of black jack – overview

TD learning

TD prediction

Advantages of TD prediction methods

Optimality of TD(0)

SARSA – on policy TD control

Q-learning : Off policy TD control

Expected SARSA

Maximization Bias and Double Q learning

Student Ratings & Reviews

Material Includes

Multi armed bandits
Multi-armed bandits are a class of reinforcement learning problems where an agent must decide which action (or "arm") to choose from a set of options, each associated with an unknown reward.