Refactor and Optimize Complex Sales Reporting SQL Query
Optimize a poorly written, inefficient SQL query that aggregates sales data by region and product category while filtering on multiple conditions. Improve readability, performance and ensure the output remains correct.
Challenge prompt
You are given a complex query that calculates total sales, average order amount, and number of unique customers by region and product category for the last fiscal year. The existing query uses multiple nested subqueries, redundant joins, and lacks proper indexing hints, causing slow performance on large datasets. Refactor the query to: 1. Remove unnecessary subqueries and joins. 2. Utilize efficient JOIN types and filtering methods. 3. Add appropriate aggregation and grouping logic. 4. Ensure the query returns the same results but executes faster. The database schema includes the following tables: - sales(order_id, customer_id, product_id, amount, order_date) - customers(customer_id, region) - products(product_id, category) Output columns should be: region, category, total_sales, avg_order_amount, unique_customers.
Guidance
- • Focus on simplifying JOINs and removing redundant intermediate aggregations.
- • Consider using window functions or CTEs for clarity and performance.
- • Make sure filtering on order_date is applied as early as possible.
Hints
- • Try replacing nested subqueries with Common Table Expressions (CTEs) to improve readability and allow SQL engine optimization.
- • Use INNER JOIN instead of LEFT JOIN if you are certain all referenced rows exist, to improve join efficiency.
- • Apply WHERE filters before joining and aggregating to reduce dataset size early.
Starter code
SELECT
c.region,
p.category,
SUM(s.amount) AS total_sales,
AVG(s.amount) AS avg_order_amount,
COUNT(DISTINCT s.customer_id) AS unique_customers
FROM sales s
JOIN customers c ON s.customer_id = c.customer_id
JOIN products p ON s.product_id = p.product_id
WHERE s.order_date BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY c.region, p.category;Expected output
region | category | total_sales | avg_order_amount | unique_customers -------+-------------+-------------+------------------+------------------ North | Electronics | 1250000.00 | 250.00 | 500 South | Furniture | 850000.00 | 170.00 | 400 East | Toys | 765000.00 | 255.00 | 300 West | Clothing | 980000.00 | 245.00 | 450
Core concepts
Challenge a Friend
Send this duel to someone else and see if they can solve it.