Calculus Essentials for Machine Learning
Calculus is used in Machine Learning to optimize model parameters. You mainly need derivatives, gradients and the chain rule, not heavy theory.
Derivatives & Slopes
The derivative of a function measures how fast it changes. Intuitively, it is the slope of the tangent line at a point. In Machine Learning, we use derivatives to see how the loss changes when we change the parameters.
- For \(f(x) = x^2\), the derivative is \(f'(x) = 2x\).
- For \(f(x) = wx + b\), the derivative w.r.t \(w\) is \(x\).
- The derivative tells us the direction to move to reduce the loss.
import numpy as np
def f(x):
return x**2
def numerical_derivative(func, x, eps=1e-5):
return (func(x + eps) - func(x - eps)) / (2 * eps)
xs = np.linspace(-3, 3, 7)
for x in xs:
print(f"x={x: .1f}, f(x)={f(x): .2f}, approx f'(x)={numerical_derivative(f, x): .2f}")
Gradients & Multivariate Functions
For functions with many parameters, we use a gradient, which is a vector of partial derivatives. It tells us how the function changes with respect to each parameter.
import numpy as np
def loss(w):
# Simple quadratic loss: L(w1, w2) = w1^2 + 2*w2^2
return w[0]**2 + 2 * w[1]**2
def grad_loss(w, eps=1e-5):
g = np.zeros_like(w, dtype=float)
for i in range(len(w)):
w_pos = w.copy()
w_neg = w.copy()
w_pos[i] += eps
w_neg[i] -= eps
g[i] = (loss(w_pos) - loss(w_neg)) / (2 * eps)
return g
w = np.array([1.0, -2.0])
print("w:", w)
print("loss(w):", loss(w))
print("grad L(w):", grad_loss(w))
Gradient Descent (Optimization)
Gradient Descent is an iterative optimization algorithm. At each step we move in the opposite direction of the gradient to reduce the loss.
# Simple gradient descent on a 1D function
def f(x):
return x**2
def f_prime(x):
return 2 * x
x = 5.0
lr = 0.1
for step in range(10):
grad = f_prime(x)
x = x - lr * grad
print(f"step={step:02d}, x={x:.4f}, f(x)={f(x):.4f}")