Calculus Essentials for Machine Learning

Calculus is used in Machine Learning to optimize model parameters. You mainly need derivatives, gradients and the chain rule, not heavy theory.

Derivatives & Slopes

The derivative of a function measures how fast it changes. Intuitively, it is the slope of the tangent line at a point. In Machine Learning, we use derivatives to see how the loss changes when we change the parameters.

For \(f(x) = x^2\), the derivative is \(f'(x) = 2x\).
For \(f(x) = wx + b\), the derivative w.r.t \(w\) is \(x\).
The derivative tells us the direction to move to reduce the loss.

import numpy as np

def f(x):
    return x**2

def numerical_derivative(func, x, eps=1e-5):
    return (func(x + eps) - func(x - eps)) / (2 * eps)

xs = np.linspace(-3, 3, 7)
for x in xs:
    print(f"x={x: .1f}, f(x)={f(x): .2f}, approx f'(x)={numerical_derivative(f, x): .2f}")

Gradients & Multivariate Functions

For functions with many parameters, we use a gradient, which is a vector of partial derivatives. It tells us how the function changes with respect to each parameter.

import numpy as np

def loss(w):
    # Simple quadratic loss: L(w1, w2) = w1^2 + 2*w2^2
    return w[0]**2 + 2 * w[1]**2

def grad_loss(w, eps=1e-5):
    g = np.zeros_like(w, dtype=float)
    for i in range(len(w)):
        w_pos = w.copy()
        w_neg = w.copy()
        w_pos[i] += eps
        w_neg[i] -= eps
        g[i] = (loss(w_pos) - loss(w_neg)) / (2 * eps)
    return g

w = np.array([1.0, -2.0])
print("w:", w)
print("loss(w):", loss(w))
print("grad L(w):", grad_loss(w))

Gradient Descent (Optimization)

Gradient Descent is an iterative optimization algorithm. At each step we move in the opposite direction of the gradient to reduce the loss.

# Simple gradient descent on a 1D function
def f(x):
    return x**2

def f_prime(x):
    return 2 * x

x = 5.0
lr = 0.1

for step in range(10):
    grad = f_prime(x)
    x = x - lr * grad
    print(f"step={step:02d}, x={x:.4f}, f(x)={f(x):.4f}")

Next: Probability Theory