We are working on the course content with rocket speed !! sooner we will be loaded with all the courses

Course 2 : Reinforcement Learning-Approximate Solution Methods

Course Curriculum

Value based methods
Value function approximation methods aim to estimate the state-value function (V) or the action-value function (Q) using a function approximator. The function approximator takes the state or state-action features as input and predicts the corresponding value. Common techniques used for value function approximation include linear approximation, neural networks, and radial basis functions.

  • Deep Q-Network (DQN)
    00:00
  • DDQN
    00:00
  • Dueling Network Architectures:
    00:00
  • Rainbow
    00:00

Policy Gradient based methods
Instead of approximating value functions, policy gradient methods directly search for an optimal policy by updating the policy's parameters in the direction of higher expected rewards. Popular algorithms: REINFORCE (Monte Carlo Policy Gradient) Proximal Policy Optimization (PPO) Trust Region Policy Optimization (TRPO)

Actor Critic methods
Actor-Critic methods are a class of reinforcement learning algorithms that combine elements of both value-based methods and policy-based methods. These methods are designed to learn optimal policies in reinforcement learning tasks. In reinforcement learning, an agent interacts with an environment and takes actions to maximize its cumulative rewards. The goal is to find an optimal policy that maps states to actions, which maximizes the expected long-term rewards. Actor-Critic methods consist of two components: Actor: The actor component learns the policy. It is responsible for selecting actions based on the observed states. The actor can be represented as a neural network or any other function approximator. It takes the current state as input and outputs a probability distribution over the available actions. Critic: The critic component evaluates the value or quality of the chosen actions. It estimates the expected cumulative rewards, also known as the state-value or action-value function. The critic provides feedback to the actor by estimating how good the actor's actions are in a given state.

Function Approx methods

Asynchronous DP

Student Ratings & Reviews

No Review Yet
No Review Yet
60,000

Material Includes

  • Course Content - PDFs
  • Python notebooks or .py files
  • Recorded videos
  • Quizzes
  • Projects at the end of each major topics
  • Course End project
  • Final Quiz