How to Use Window Functions in SQL for Data Analysis: A Beginner’s Guide
Learn how to leverage SQL window functions for powerful and easy data analysis with this beginner-friendly tutorial.
Window functions in SQL are a powerful tool that let you perform calculations across a set of rows related to the current query row, without collapsing the results like aggregation functions do. They are extremely useful for data analysis tasks such as calculating running totals, ranking rows, or finding moving averages.
In this guide, we will explore some common window functions and how to use them with simple examples so you can start using them in your own SQL queries right away.
### What Are Window Functions?
Window functions operate on a set of rows called the 'window' and return a value for each row from the input. The window is defined using the `OVER()` clause, which can include partitions and ordering.
Common window functions include `ROW_NUMBER()`, `RANK()`, `SUM()`, `AVG()`, `LAG()`, and `LEAD()`.
### Basic Example Dataset
Assume we have a table named `sales` with the following columns: `id`, `salesperson`, `sale_date`, and `amount`.
SELECT * FROM sales;
-- Example rows:
-- id | salesperson | sale_date | amount
-- 1 | Alice | 2024-01-01 | 500
-- 2 | Bob | 2024-01-02 | 300
-- 3 | Alice | 2024-01-03 | 200
-- 4 | Bob | 2024-01-04 | 700
-- 5 | Alice | 2024-01-05 | 400### Using ROW_NUMBER() to Rank Sales by Salesperson
The `ROW_NUMBER()` function assigns a unique sequential number to rows within a partition (group) of a result set, ordered by a column.
SELECT
salesperson,
sale_date,
amount,
ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY sale_date) AS sale_rank
FROM sales;In this query, each salesperson's sales are numbered in order of the sale date.
### Calculating Running Total with SUM()
You can calculate a running total of sales amounts for each salesperson using `SUM()` with a window.
SELECT
salesperson,
sale_date,
amount,
SUM(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total
FROM sales;This sums the sales amounts in order, resetting the sum for each salesperson.
### Finding Previous Sale Amounts with LAG()
The `LAG()` function accesses data from a previous row in the same result set, which is useful for comparing current values with previous ones.
SELECT
salesperson,
sale_date,
amount,
LAG(amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS previous_sale_amount
FROM sales;Here, for each sale, you can see the amount from the previous sale made by the same salesperson.
### Summary
Window functions greatly enhance your ability to analyze data directly in SQL by providing flexible ways to rank, aggregate, and compare rows without grouping them. Start experimenting with them in your queries to gain deeper insights and write more efficient code.