Mastering Window Functions: Advanced SQL Techniques for Data Analysis
Learn how to use SQL window functions to perform advanced data analysis, including ranking, running totals, and moving averages. Perfect for beginners ready to boost their SQL skills.
Window functions are powerful tools in SQL that allow you to perform calculations across a set of table rows related to the current row without collapsing the result into a single output. Unlike aggregate functions that summarize data, window functions preserve individual rows while adding aggregated or ranking information. This makes them essential for advanced data analysis.
Let's explore some common window functions, such as ROW_NUMBER(), RANK(), SUM(), and AVG(), and see how they can help in practical scenarios.
Imagine you have a sales table like this:
CREATE TABLE sales (
id INT,
employee VARCHAR(50),
region VARCHAR(50),
sale_date DATE,
amount DECIMAL(10, 2)
);
INSERT INTO sales VALUES
(1, 'Alice', 'North', '2024-01-01', 100.00),
(2, 'Bob', 'South', '2024-01-02', 300.00),
(3, 'Alice', 'North', '2024-01-05', 150.00),
(4, 'Charlie', 'North', '2024-01-07', 200.00),
(5, 'Bob', 'South', '2024-01-10', 100.00);### 1. ROW_NUMBER() — Assign Unique Ranks
ROW_NUMBER() assigns a unique sequential integer to rows within a partition of a result set. For example, to rank sales by each employee by date:
SELECT
employee,
sale_date,
amount,
ROW_NUMBER() OVER (PARTITION BY employee ORDER BY sale_date) AS sale_rank
FROM sales;This returns each sale for an employee with a ranking based on the date.
### 2. RANK() — Handle Ties in Ranking
RANK() provides rankings but gives the same rank to tied values, skipping ranks after the tie. For example, rank employees by their sale amount in descending order:
SELECT
employee,
amount,
RANK() OVER (ORDER BY amount DESC) AS rank
FROM sales;If two employees have the same amount, they will receive the same rank.
### 3. SUM() with OVER() — Calculate Running Totals
You can calculate a running total of sales for each employee ordered by sale date:
SELECT
employee,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY employee ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales;This shows how sales accumulate over time for each employee.
### 4. AVG() with OVER() — Calculate Moving Averages
To smooth fluctuations, calculate a moving average of sales over the last 2 sales for each employee:
SELECT
employee,
sale_date,
amount,
AVG(amount) OVER (PARTITION BY employee ORDER BY sale_date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales;This gives the average sale amount of the current and previous sale for each employee.
### Summary
Window functions greatly enhance your ability to analyze data without losing detail. By mastering ROW_NUMBER(), RANK(), and aggregate window functions like SUM() and AVG(), you can perform complex calculations such as ranking with ties, running totals, and moving averages easily and efficiently.
Use window functions whenever you need to perform calculations relative to other rows in your dataset, keeping the original row-level details intact.
With practice, these techniques will empower you to produce insightful data analyses and reports.