Reinforcement Learning

In Reinforcement Learning (RL), an agent learns to make decisions by interacting with an environment and maximizing cumulative reward over time.

Key Components

Agent: the learner / decision maker.
Environment: everything the agent interacts with.
State \(s_t\): the situation the agent observes.
Action \(a_t\): choice made by the agent.
Reward \(r_t\): scalar feedback signal.
Policy \(\pi\): mapping from states to actions.

Q-Learning (Value-Based RL)

Q-Learning learns an action‑value function \(Q(s, a)\) estimating the expected return of taking action \(a\) in state \(s\) and following the optimal policy thereafter.

The update rule is:

\[ Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[r_t + \gamma \max_a Q(s_{t+1}, a) - Q(s_t, a_t)\right] \]

Exploration vs Exploitation

RL must balance exploration (trying new actions) and exploitation (choosing known good actions). A common strategy is \(\epsilon\)-greedy:

With probability \(\epsilon\) choose a random action.
With probability \(1 - \epsilon\) choose the best‑estimated action.

Applications of RL

Game playing (Atari, Chess, Go) using deep RL agents.
Robotics control and locomotion.
Recommendation systems that adapt to user feedback.
Dynamic pricing and bidding in online advertising.

Previous: Anomaly Detection Next: Time Series Analysis