How to Use Window Functions in SQL for Advanced Data Analysis

Learn how to leverage window functions in SQL to perform advanced data analysis, including ranking, running totals, and moving averages with simple examples.

Window functions in SQL provide a powerful way to perform calculations across a set of rows related to the current query row, without collapsing the results into a single output row. They are extremely useful for advanced data analysis like running totals, rankings, and moving averages, while preserving the original row structure.

Unlike aggregate functions that group rows together, window functions allow you to perform aggregates and calculations over a "window" or partition of rows while still returning individual row details.

Here’s a beginner-friendly introduction to some common window functions and how to use them.

### Example Table: sales_data Assume you have a table named sales_data with the following columns: - id (unique sale ID) - employee_id (ID of the salesperson) - sale_date (date of the sale) - amount (sale amount)

### 1. Using ROW_NUMBER() to Rank Sales per Employee

You might want to assign a rank to each sale by amount per employee.

sql
SELECT
  id,
  employee_id,
  sale_date,
  amount,
  ROW_NUMBER() OVER (PARTITION BY employee_id ORDER BY amount DESC) AS sale_rank
FROM sales_data;

This query assigns a ranking within each employee’s sales, ordering from highest to lowest sale amount.

### 2. Calculating Running Total (Cumulative Sum) of Sales per Employee

To calculate the running total of sales for each employee ordered by date:

sql
SELECT
  employee_id,
  sale_date,
  amount,
  SUM(amount) OVER (PARTITION BY employee_id ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales_data;

This shows a cumulative sum of sales amounts for each employee up to the current sale date.

### 3. Calculating Moving Average of Sales

You can also calculate a moving average, for example, a 3-sale moving average per employee ordered by sale_date:

sql
SELECT
  employee_id,
  sale_date,
  amount,
  AVG(amount) OVER (PARTITION BY employee_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM sales_data;

This takes the average of the current sale and the two previous sales for each employee.

### 4. Using LAG() and LEAD() to Compare Previous or Next Rows

The LAG() function lets you access data from the previous row, while LEAD() accesses data from the next. For example, to see the previous sale amount for each sale:

sql
SELECT
  employee_id,
  sale_date,
  amount,
  LAG(amount) OVER (PARTITION BY employee_id ORDER BY sale_date) AS previous_sale_amount,
  LEAD(amount) OVER (PARTITION BY employee_id ORDER BY sale_date) AS next_sale_amount
FROM sales_data;

This is useful to compare changes in sales amounts between consecutive sales.

### Summary

Window functions open up advanced data analysis possibilities while keeping your result sets detailed and easy to manipulate. Remember to: - Use PARTITION BY to define your groups - Use ORDER BY to define the order inside each group - Use ROWS BETWEEN or RANGE BETWEEN to control the window frame Try combining these functions to create detailed, insightful reports with simple SQL.