Course 2 : Reinforcement Learning-Approximate Solution Methods
Course Curriculum
Value based methods
Value function approximation methods aim to estimate the state-value function (V) or the action-value function (Q) using a function approximator.
The function approximator takes the state or state-action features as input and predicts the corresponding value.
Common techniques used for value function approximation include linear approximation, neural networks, and radial basis functions.
-
Deep Q-Network (DQN)
00:00 -
DDQN
00:00 -
Dueling Network Architectures:
00:00 -
Rainbow
00:00
Policy Gradient based methods
Instead of approximating value functions, policy gradient methods directly search for an optimal policy by updating the policy's parameters in the direction of higher expected rewards.
Popular algorithms:
REINFORCE (Monte Carlo Policy Gradient)
Proximal Policy Optimization (PPO)
Trust Region Policy Optimization (TRPO)
-
REINFORCE
00:00 -
Proximal Policy Optimization
00:00 -
Trust region Policy Optimization
00:00
Actor Critic methods
Actor-Critic methods are a class of reinforcement learning algorithms that combine elements of both value-based methods and policy-based methods. These methods are designed to learn optimal policies in reinforcement learning tasks.
In reinforcement learning, an agent interacts with an environment and takes actions to maximize its cumulative rewards. The goal is to find an optimal policy that maps states to actions, which maximizes the expected long-term rewards.
Actor-Critic methods consist of two components:
Actor: The actor component learns the policy. It is responsible for selecting actions based on the observed states. The actor can be represented as a neural network or any other function approximator. It takes the current state as input and outputs a probability distribution over the available actions.
Critic: The critic component evaluates the value or quality of the chosen actions. It estimates the expected cumulative rewards, also known as the state-value or action-value function. The critic provides feedback to the actor by estimating how good the actor's actions are in a given state.
-
Advantage Actor-Critic (A2C)
00:00 -
Asynchronous Advantage Actor-Critic (A3C).
00:00
Function Approx methods
Asynchronous DP
Student Ratings & Reviews
No Review Yet
₹60,000
-
LevelExpert
-
Duration16 hours
-
Last UpdatedApril 15, 2024
Hi, Welcome back!
Material Includes
- Course Content - PDFs
- Python notebooks or .py files
- Recorded videos
- Quizzes
- Projects at the end of each major topics
- Course End project
- Final Quiz