Mastering Window Functions for Advanced Data Analytics in SQL

Learn how to use SQL window functions to perform powerful and advanced data analytics with beginner-friendly examples and clear explanations.

Window functions are a powerful feature in SQL that allow you to perform calculations across a set of table rows related to the current row. They are essential for advanced data analytics because they let you compute running totals, rankings, moving averages, and other cumulative metrics without collapsing your query results.

Unlike aggregate functions, which group rows and return a single result per group, window functions preserve individual rows while still performing calculations over values in a defined window (or partition). This enables detailed analytics and reporting within your dataset.

Let's explore some common window functions with practical examples using a sample sales table structured like this:

sql
CREATE TABLE sales (
    id INT,
    salesperson VARCHAR(50),
    region VARCHAR(50),
    sale_amount DECIMAL(10, 2),
    sale_date DATE
);

### 1. ROW_NUMBER(): Assigns a unique sequential number to each row within a partition ordered by some columns.

sql
SELECT
  salesperson,
  sale_date,
  sale_amount,
  ROW_NUMBER() OVER (PARTITION BY salesperson ORDER BY sale_date) AS sale_rank
FROM sales;

This query ranks each sale per salesperson by the sale date, starting at 1 for the earliest sale.

### 2. RANK() and DENSE_RANK(): Similar to ROW_NUMBER but handle ties differently.

sql
SELECT
  salesperson,
  sale_amount,
  RANK() OVER (PARTITION BY salesperson ORDER BY sale_amount DESC) AS rank,
  DENSE_RANK() OVER (PARTITION BY salesperson ORDER BY sale_amount DESC) AS dense_rank
FROM sales;

RANK gives the same rank to tied values but skips rank numbers after ties. DENSE_RANK also ties but does not skip numbers.

### 3. SUM() with OVER(): Calculate running totals or totals per partition while keeping all rows.

sql
SELECT
  salesperson,
  sale_date,
  sale_amount,
  SUM(sale_amount) OVER (PARTITION BY salesperson ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM sales;

This calculates a running total of sales for each salesperson ordered by the sale date.

### 4. LAG() and LEAD(): Access data from previous or next rows without self-joins.

sql
SELECT
  salesperson,
  sale_date,
  sale_amount,
  LAG(sale_amount, 1) OVER (PARTITION BY salesperson ORDER BY sale_date) AS previous_sale,
  LEAD(sale_amount, 1) OVER (PARTITION BY salesperson ORDER BY sale_date) AS next_sale
FROM sales;

These functions let you compare each sale amount to the previous and next sale amounts for the same salesperson.

### Practical Use Case: Finding Top Sales per Region

sql
SELECT * FROM (
  SELECT
    region,
    salesperson,
    sale_amount,
    RANK() OVER (PARTITION BY region ORDER BY sale_amount DESC) AS sales_rank
  FROM sales
) ranked_sales
WHERE sales_rank = 1;

This query identifies the top salespeople in each region with the highest sales amount.

### Summary

Window functions extend SQL's analytical capabilities by allowing you to perform complex calculations across related rows without losing detail. With ROW_NUMBER, RANK, SUM, LAG, LEAD, and others, you can create insightful reports and dashboards easily. Practice these functions on your datasets to become proficient in advanced SQL data analysis.