sqladvanced15 minutes

Refactor and Optimize a Complex SQL Query Using Window Functions

Given a complex SQL query with multiple nested subqueries and joins, refactor it to improve readability, performance, and maintainability while preserving the original output.

Challenge prompt

You are provided with a SQL query that calculates rank, running totals, and aggregates from a sales dataset. The original query uses multiple nested subqueries and joins that cause inefficient execution and are hard to maintain. Your task is to refactor the SQL code to use window functions and CTEs (Common Table Expressions) to improve query performance and clarity without changing the output results. Ensure your refactored query produces the same dataset with the same order and columns.

Guidance

  • Use window functions such as ROW_NUMBER(), RANK(), SUM() OVER(), or others to simplify aggregation and ranking logic.
  • Replace nested subqueries with CTE blocks for better modularity and readability.
  • Verify output consistency by comparing results before and after refactoring.

Hints

  • Look for repeated join patterns and subqueries that compute partial aggregates—these can frequently be replaced by window functions over partitions.
  • Common window functions like ROW_NUMBER() and SUM() OVER(PARTITION BY ...) can often eliminate inefficient grouping and self-joins.
  • Ensure ORDER BY clauses inside window functions match the original query’s ordering to preserve ranking logic.

Starter code

WITH sales_data AS (
  SELECT 
    s.customer_id,
    s.order_date,
    s.amount,
    c.region
  FROM sales s
  JOIN customers c ON s.customer_id = c.id
),
ranked_sales AS (
  SELECT 
    sd.customer_id,
    sd.order_date,
    sd.amount,
    sd.region,
    (SELECT COUNT(DISTINCT s2.order_date)
     FROM sales_data s2
     WHERE s2.customer_id = sd.customer_id
       AND s2.order_date <= sd.order_date) AS order_rank
  FROM sales_data sd
),
aggregated_totals AS (
  SELECT
    customer_id,
    MAX(order_rank) AS total_orders
  FROM ranked_sales
  GROUP BY customer_id
),
final_output AS (
  SELECT
    rs.customer_id,
    rs.order_date,
    rs.amount,
    rs.region,
    rs.order_rank,
    at.total_orders,
    (SELECT SUM(amount) FROM sales_data WHERE customer_id = rs.customer_id AND order_date <= rs.order_date) AS running_total
  FROM ranked_sales rs
  JOIN aggregated_totals at ON rs.customer_id = at.customer_id
)
SELECT * FROM final_output
ORDER BY customer_id, order_date;

Expected output

The result table with columns: customer_id, order_date, amount, region, order_rank (rank of order_date per customer), total_orders (total number of orders per customer), running_total (cumulative amount per customer ordered by date). The rows should be ordered by customer_id then order_date, with the same values as the original nested-subquery output.

Core concepts

Window FunctionsCommon Table Expressions (CTEs)Query OptimizationSQL JoinsAggregation

Challenge a Friend

Send this duel to someone else and see if they can solve it.