sqladvanced15 minutes

Refactor and Optimize a Complex Customer Sales Ranking Query

Improve an existing SQL query that ranks customers by total sales across multiple regions, enhancing its readability and performance without changing the result.

Challenge prompt

Given the following SQL query that calculates each customer's total sales across different regions, ranks them, and filters the top 10, refactor it to optimize performance and readability while returning the exact same result set. Your refactor should reduce redundant subqueries and improve indexing utilization. The original query is provided below. -- Original Query SELECT c.customer_id, c.customer_name, SUM(o.amount) AS total_sales, RANK() OVER (ORDER BY SUM(o.amount) DESC) AS sales_rank FROM customers c JOIN orders o ON c.customer_id = o.customer_id JOIN ( SELECT region_id FROM regions WHERE region_name IN ('North', 'East', 'West') ) r ON o.region_id = r.region_id GROUP BY c.customer_id, c.customer_name HAVING SUM(o.amount) > (SELECT AVG(total_amount) FROM ( SELECT SUM(amount) AS total_amount FROM orders GROUP BY customer_id ) AS sub) ORDER BY sales_rank LIMIT 10;

Guidance

  • Focus on minimizing nested subqueries that can be expressed as joins or common table expressions (CTEs) for clarity.
  • Make sure window functions and aggregates are computed just once without redundant recalculations.
  • Ensure the final output columns and ordering remain unchanged after refactoring.

Hints

  • Try using a CTE for the average total sales per customer instead of an inline subquery in HAVING.
  • Consider pre-aggregating orders before joining to customers to reduce the join data scanned.
  • Window functions can sometimes be applied after filtering to optimize performance.

Starter code

SELECT
  c.customer_id,
  c.customer_name,
  SUM(o.amount) AS total_sales,
  RANK() OVER (ORDER BY SUM(o.amount) DESC) AS sales_rank
FROM
  customers c
JOIN
  orders o ON c.customer_id = o.customer_id
JOIN
  (
    SELECT region_id FROM regions WHERE region_name IN ('North', 'East', 'West')
  ) r ON o.region_id = r.region_id
GROUP BY
  c.customer_id,
  c.customer_name
HAVING
  SUM(o.amount) > (SELECT AVG(total_amount) FROM (
    SELECT SUM(amount) AS total_amount
    FROM orders
    GROUP BY customer_id
  ) AS sub)
ORDER BY
  sales_rank
LIMIT 10;

Expected output

Returns top 10 customers ranked by total sales amount (filtered by regions 'North', 'East', and 'West') whose total sales exceed the average total sales of all customers. Columns: customer_id, customer_name, total_sales, sales_rank ordered ascending by sales_rank.

Core concepts

SQL optimizationCTE usageWindow functionsAggregate functions

Challenge a Friend

Send this duel to someone else and see if they can solve it.