sqladvanced15 minutes

Refactor Complex SQL Query for Improved Performance and Readability

You are given a large and complex SQL query that fetches sales summary data with multiple subqueries and redundant joins. Refactor the query to improve its readability and optimize performance while ensuring the output remains exactly the same.

Challenge prompt

Given the following SQL query, refactor it to improve clarity and reduce execution time without altering the results. Focus on eliminating unnecessary subqueries, optimizing joins, and leveraging window functions or CTEs if appropriate. The query aggregates sales data by product category and month, combining information from orders, products, and customers tables.

Guidance

  • Identify and remove redundant subqueries and joins that can be combined or simplified.
  • Consider using Common Table Expressions (CTEs) to break down complex parts for better readability.
  • Use window functions to replace subqueries where possible for efficiency.

Hints

  • Check if some joins or filters can be pushed earlier to limit data processed.
  • Look for repeated aggregations that can be calculated once and reused.
  • Review the use of GROUP BY and see if aggregations can be combined.

Starter code

SELECT
  c.category_name,
  DATE_TRUNC('month', o.order_date) AS order_month,
  (SELECT COUNT(*) FROM orders WHERE product_id = p.product_id AND DATE_TRUNC('month', order_date) = DATE_TRUNC('month', o.order_date)) AS total_orders,
  (SELECT SUM(quantity) FROM order_items oi WHERE oi.product_id = p.product_id AND oi.order_id IN (SELECT order_id FROM orders WHERE DATE_TRUNC('month', order_date) = DATE_TRUNC('month', o.order_date))) AS total_quantity,
  (SELECT SUM(oi.price * oi.quantity) FROM order_items oi WHERE oi.product_id = p.product_id AND oi.order_id IN (SELECT order_id FROM orders WHERE DATE_TRUNC('month', order_date) = DATE_TRUNC('month', o.order_date))) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.product_id
JOIN categories c ON p.category_id = c.category_id
GROUP BY c.category_name, order_month
ORDER BY c.category_name, order_month;

Expected output

Table with columns: category_name, order_month (first day of month), total_orders, total_quantity, total_revenue showing aggregated sales data per category per month, sorted by category_name and order_month.

Core concepts

SQL query optimizationCommon Table Expressions (CTEs)Window functionsQuery refactoring

Challenge a Friend

Send this duel to someone else and see if they can solve it.