Course 2 : Reinforcement Learning-Approximate Solution Methods – AI school | Immersive teaching

We are working on the course content with rocket speed !! sooner we will be loaded with all the courses

Course 2 : Reinforcement Learning-Approximate Solution Methods

Categories Reinforcement Learning

Course Curriculum

Value based methods
Value function approximation methods aim to estimate the state-value function (V) or the action-value function (Q) using a function approximator. The function approximator takes the state or state-action features as input and predicts the corresponding value. Common techniques used for value function approximation include linear approximation, neural networks, and radial basis functions.

Deep Q-Network (DQN)

00:00
DDQN

00:00
Dueling Network Architectures:

00:00
Rainbow

00:00

Policy Gradient based methods
Instead of approximating value functions, policy gradient methods directly search for an optimal policy by updating the policy's parameters in the direction of higher expected rewards. Popular algorithms: REINFORCE (Monte Carlo Policy Gradient) Proximal Policy Optimization (PPO) Trust Region Policy Optimization (TRPO)

Actor Critic methods
Actor-Critic methods are a class of reinforcement learning algorithms that combine elements of both value-based methods and policy-based methods. These methods are designed to learn optimal policies in reinforcement learning tasks. In reinforcement learning, an agent interacts with an environment and takes actions to maximize its cumulative rewards. The goal is to find an optimal policy that maps states to actions, which maximizes the expected long-term rewards. Actor-Critic methods consist of two components: Actor: The actor component learns the policy. It is responsible for selecting actions based on the observed states. The actor can be represented as a neural network or any other function approximator. It takes the current state as input and outputs a probability distribution over the available actions. Critic: The critic component evaluates the value or quality of the chosen actions. It estimates the expected cumulative rewards, also known as the state-value or action-value function. The critic provides feedback to the actor by estimating how good the actor's actions are in a given state.

Function Approx methods

Asynchronous DP

Student Ratings & Reviews

No Review Yet

No Review Yet

Course 6 : Understanding Quantization Essentials with Hugging Face

Intermediate

Course 6 : Understanding Quantization Essentials with Hugging Face

₹25,000

Course 5 : Structuring and Evaluating Advanced RAG Models

Expert

Course 5 : Structuring and Evaluating Advanced RAG Models

₹15,000

Course 4 : Comprehensive coverage on Langchain

Expert

Course 4 : Comprehensive coverage on Langchain

₹25,000

Course 2 : GenAI for everyone

Beginner

Course 2 : GenAI for everyone

₹20,000

Course 2 : Advanced OR Techniques

Expert

Course 2 : Advanced OR Techniques

₹75,000