Machine Learning
Reinforcement Learning
Agent‑Based
Reinforcement Learning
In Reinforcement Learning (RL), an agent learns to make decisions by interacting with an environment and maximizing cumulative reward over time.
Key Components
- Agent: the learner / decision maker.
- Environment: everything the agent interacts with.
- State \(s_t\): the situation the agent observes.
- Action \(a_t\): choice made by the agent.
- Reward \(r_t\): scalar feedback signal.
- Policy \(\pi\): mapping from states to actions.
Q-Learning (Value-Based RL)
Q-Learning learns an action‑value function \(Q(s, a)\) estimating the expected return of taking action \(a\) in state \(s\) and following the optimal policy thereafter.
The update rule is:
\[ Q(s_t, a_t) \leftarrow Q(s_t, a_t) + \alpha \left[r_t + \gamma \max_a Q(s_{t+1}, a) - Q(s_t, a_t)\right] \]
Exploration vs Exploitation
RL must balance exploration (trying new actions) and exploitation (choosing known good actions). A common strategy is \(\epsilon\)-greedy:
- With probability \(\epsilon\) choose a random action.
- With probability \(1 - \epsilon\) choose the best‑estimated action.
Applications of RL
- Game playing (Atari, Chess, Go) using deep RL agents.
- Robotics control and locomotion.
- Recommendation systems that adapt to user feedback.
- Dynamic pricing and bidding in online advertising.