Mastering SQL Window Functions for Advanced Data Analysis

Learn the basics and power of SQL window functions to perform advanced data analysis with ease. This beginner-friendly tutorial introduces key window functions with practical examples.

SQL window functions allow you to perform calculations across sets of rows related to the current query row without collapsing the result into a single output row. This means you can calculate running totals, moving averages, ranks, and much more while still viewing all the original rows.

Let's dive into some commonly used window functions and see how they work with simple examples using a sample sales table.

Suppose we have a table called sales_orders with the following columns: - order_id - customer_id - order_date - amount

sql
-- Sample Data
CREATE TABLE sales_orders (
  order_id INT,
  customer_id INT,
  order_date DATE,
  amount DECIMAL(10,2)
);

INSERT INTO sales_orders VALUES
(1, 101, '2024-01-01', 150.00),
(2, 102, '2024-01-02', 200.00),
(3, 101, '2024-01-03', 50.00),
(4, 103, '2024-01-04', 300.00),
(5, 102, '2024-01-05', 120.00);

### ROW_NUMBER() This function provides a unique sequential integer to rows within a partition of the query result, ordered by specified columns.

sql
SELECT
  order_id,
  customer_id,
  order_date,
  amount,
  ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_rank
FROM sales_orders;

Here, ROW_NUMBER() restarts numbering for each customer_id, ordered by order_date. This helps identify the first, second, third order per customer.

### RANK() and DENSE_RANK() These help assign ranks to rows within a partition. RANK() skips rank numbers if there are ties; DENSE_RANK() does not.

sql
SELECT
  customer_id,
  amount,
  RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY customer_id ORDER BY amount DESC) AS dense_rank
FROM sales_orders;

If two orders have the same amount for the same customer, RANK() might produce gaps in numbering, while DENSE_RANK() will not.

### SUM(), AVG(), MIN(), MAX() Over Windows You can calculate cumulative sums or averages over a defined window of rows.

sql
SELECT
  order_id,
  customer_id,
  amount,
  SUM(amount) OVER (PARTITION BY customer_id ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales_orders;

This query calculates a running total of amounts per customer ordered by the order_date.

### LAG() and LEAD() These functions let you access data in a previous or next row without self join.

sql
SELECT
  order_id,
  customer_id,
  order_date,
  amount,
  LAG(amount, 1) OVER (PARTITION BY customer_id ORDER BY order_date) AS previous_amount,
  LEAD(amount, 1) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_amount
FROM sales_orders;

Here, LAG() shows the previous order amount per customer, and LEAD() shows the next order amount.

### Summary Window functions are incredibly useful for advanced analysis like ranking, running totals, comparisons across rows, and statistical calculations—all while maintaining the detail of each row. Practice using these functions with your datasets to uncover deeper insights.