Mastering Window Functions: Advanced SQL Tricks for Data Analysis

Learn how to use SQL window functions to perform advanced data analysis with clear examples and beginner-friendly explanations.

Window functions in SQL are powerful tools that allow you to perform calculations across a set of table rows related to the current row. Unlike aggregate functions that collapse multiple rows into one, window functions preserve the rows while providing valuable analytics like running totals, rankings, moving averages, and more.

In this tutorial, we'll explore some common window functions and see how to use the OVER() clause to analyze data without losing detail.

Let's start with a simple example using sales data:

sql
CREATE TABLE sales (
  id INT,
  salesperson VARCHAR(50),
  region VARCHAR(50),
  sale_amount DECIMAL(10,2),
  sale_date DATE
);

INSERT INTO sales VALUES
(1, 'Alice', 'North', 500, '2024-01-01'),
(2, 'Bob', 'North', 300, '2024-01-02'),
(3, 'Alice', 'North', 700, '2024-01-03'),
(4, 'Charlie', 'South', 200, '2024-01-01'),
(5, 'Bob', 'North', 100, '2024-01-04');

### Calculate Running Total per Salesperson To see how sales accumulate over time for each salesperson, use the `SUM()` window function with the `PARTITION BY` and ordering inside the `OVER()` clause.

sql
SELECT 
  id,
  salesperson,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date) AS running_total
FROM sales
ORDER BY salesperson, sale_date;

This query sums the `sale_amount` for each salesperson, ordered by `sale_date`, producing a running total that resets for each person.

### Ranking Salespeople by Total Sales Within Each Region Use the `RANK()` function to assign ranks to salespeople based on their total sales in each region.

sql
SELECT 
  salesperson,
  region,
  SUM(sale_amount) AS total_sales,
  RANK() OVER (PARTITION BY region ORDER BY SUM(sale_amount) DESC) AS sales_rank
FROM sales
GROUP BY salesperson, region
ORDER BY region, sales_rank;

Here, `RANK()` orders salespeople in each region by their total sales. Note that this requires aggregation (`SUM()`) before ranking.

### Calculate Moving Average of Sales You can compute the moving average to smooth out the sales trend over defined periods using a window frame.

sql
SELECT 
  id,
  salesperson,
  sale_date,
  sale_amount,
  AVG(sale_amount) OVER (
    PARTITION BY salesperson
    ORDER BY sale_date
    ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
  ) AS moving_avg_3_days
FROM sales
ORDER BY salesperson, sale_date;

This example calculates a 3-day moving average of sales per salesperson. The `ROWS BETWEEN 2 PRECEDING AND CURRENT ROW` defines the window frame size.

### Key Takeaways - Window functions preserve rows and add aggregate-style calculations. - `PARTITION BY` defines the group for the window function, similar to grouping. - `ORDER BY` inside `OVER()` defines the order of rows in the window. - You can define custom frames with `ROWS BETWEEN` or `RANGE BETWEEN`. Mastering these functions unlocks advanced SQL analytics capabilities essential for data analysts.

Try experimenting with these window functions on your own datasets to better understand their power!