Machine Learning Reinforcement Learning
Agent‑Based

Reinforcement Learning

In Reinforcement Learning (RL), an agent learns to make decisions by interacting with an environment and maximizing cumulative reward over time.

Key Components

  • Agent: the learner / decision maker.
  • Environment: everything the agent interacts with.
  • State \(s_t\): the situation the agent observes.
  • Action \(a_t\): choice made by the agent.
  • Reward \(r_t\): scalar feedback signal.
  • Policy \(\pi\): mapping from states to actions.

Q-Learning (Value-Based RL)

Q-Learning learns an action‑value function \(Q(s, a)\) estimating the expected return of taking action \(a\) in state \(s\) and following the optimal policy thereafter.

The update rule is:

\[ Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[r_t + \gamma \max_a Q(s_{t+1}, a) - Q(s_t, a_t)\right] \]

Exploration vs Exploitation

RL must balance exploration (trying new actions) and exploitation (choosing known good actions). A common strategy is \(\epsilon\)-greedy:

  • With probability \(\epsilon\) choose a random action.
  • With probability \(1 - \epsilon\) choose the best‑estimated action.

Applications of RL

  • Game playing (Atari, Chess, Go) using deep RL agents.
  • Robotics control and locomotion.
  • Recommendation systems that adapt to user feedback.
  • Dynamic pricing and bidding in online advertising.