Reinforcement
Learning (RL) is
a type of Machine Learning where
an agent learns how to make
decisions by interacting with an
environment. The goal is to maximize
rewards over time.
Basic Reinforcement Learning Loop:
- Agent
– Learner or decision-maker.
- Environment
– Everything the agent interacts with.
- State (S)
– Current situation of the agent.
- Action (A)
– All possible moves the agent can make.
- Reward (R)
– Feedback from the environment.
- Policy (π)
– Strategy that the agent follows to choose actions.
- Value Function (V) – Expected long-term reward from a state.
- Q-Value (Q)
– Expected reward from a state-action pair.
Goal of RL:
Find
a policy that tells the agent
what action to take in each state to maximize
the total reward (called the return).
Key Algorithms in Reinforcement Learning:
Category |
Examples |
Value-based |
Q-Learning,
Deep Q-Network (DQN) |
Policy-based |
REINFORCE,
Proximal Policy Optimization (PPO) |
Actor-Critic |
A3C,
DDPG, TD3 |
Real-Life Examples:
- Games:
Playing chess, Go, or video games (e.g., AlphaGo, DQN playing Atari).
- Robotics:
Learning to walk, grasp objects.
- Self-Driving Cars:
Learning to drive safely.
- Recommendation Systems: Choosing what content to show users.
- Traffic Optimization: Route selection or signal timing.
Simple Example (Grid World):
- State:
Agent at position (2,3)
- Actions:
Up, Down, Left, Right
- Reward:
+10 for reaching goal, -1 for hitting a wall
- Goal:
Learn the best path to goal by trial and error