SQL for Data Science Beginner
~15 min read

SQL Queries Every Data Scientist Should Know

SQL is the standard language for working with relational databases and data warehouses. You use it daily to filter, aggregate, join and prepare data for analysis or modeling.

SELECT, WHERE & ORDER BY

Most analytical queries start with SELECT, filters in WHERE, and sorting using ORDER BY.

The relational model represents data as tables (relations) with rows (tuples) and columns (attributes). Primary keys uniquely identify rows, while foreign keys link related tables. Understanding this theory helps you design queries that are both correct and efficient.

-- Get top 10 customers by total spending
SELECT
    customer_id,
    SUM(amount) AS total_spent
FROM payments
WHERE status = 'completed'
GROUP BY customer_id
ORDER BY total_spent DESC
LIMIT 10;

Joins (INNER, LEFT, RIGHT)

You often need to combine data from multiple tables, e.g. users + orders.

-- Join customers with their orders
SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.order_date,
    o.total_amount
FROM customers AS c
JOIN orders AS o
    ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';

Window Functions for Analytics

Window functions such as ROW_NUMBER(), RANK() and running totals are extremely powerful for analytics.

-- Running total of daily revenue
SELECT
    order_date,
    SUM(total_amount)                              AS daily_revenue,
    SUM(SUM(total_amount)) OVER (
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    )                                              AS running_revenue
FROM orders
GROUP BY order_date
ORDER BY order_date;