sqladvanced10 minutes

Predict the Output of a Recursive CTE with Aggregations and Window Functions

Analyze the output of an advanced SQL query using recursive common table expressions (CTEs), aggregations, and window functions. Predict the final result set given the schema and sample data.

Challenge prompt

Given the following schema and sample data representing employees and their managers, analyze the provided recursive CTE query. Determine the output—specifically, the final columns and rows returned by the query. -- Employees Table CREATE TABLE Employees( EmployeeID INT PRIMARY KEY, ManagerID INT NULL, Salary INT NOT NULL ); INSERT INTO Employees (EmployeeID, ManagerID, Salary) VALUES (1, NULL, 100000), (2, 1, 80000), (3, 1, 90000), (4, 2, 70000), (5, 2, 60000), (6, 3, 85000), (7, 3, 65000); -- Query to analyze WITH RECURSIVE HierarchyCTE AS ( SELECT EmployeeID, ManagerID, Salary, 1 AS Level FROM Employees WHERE ManagerID IS NULL UNION ALL SELECT e.EmployeeID, e.ManagerID, e.Salary, h.Level + 1 FROM Employees e JOIN HierarchyCTE h ON e.ManagerID = h.EmployeeID ), Aggregated AS ( SELECT ManagerID, COUNT(EmployeeID) AS TeamSize, AVG(Salary) AS AvgSalary FROM HierarchyCTE GROUP BY ManagerID ), Ranked AS ( SELECT h.EmployeeID, h.ManagerID, h.Level, a.TeamSize, a.AvgSalary, RANK() OVER (PARTITION BY h.ManagerID ORDER BY a.TeamSize DESC) AS TeamRank FROM HierarchyCTE h LEFT JOIN Aggregated a ON h.EmployeeID = a.ManagerID ) SELECT * FROM Ranked ORDER BY Level, TeamRank, EmployeeID;

Guidance

• Understand how the recursive CTE builds the hierarchy levels starting from the top manager (ManagerID IS NULL).
• Consider how the Aggregated CTE calculates team size and average salary grouped by manager.
• Analyze how the window function RANK() operates partitioned by ManagerID and ordered by team size descending.
• Trace the joins carefully to see how employees are linked with their team aggregation.

Hints

• Remember the base case of the recursion includes only EmployeeID 1 (top-level manager).
• Note the LEFT JOIN in Ranked means some employees might have NULLs for aggregated data if they aren't managers themselves.
• RANK() resets for each ManagerID partition; employees with the same manager get ranked according to their team size.

Starter code

CREATE TABLE Employees(
  EmployeeID INT PRIMARY KEY,
  ManagerID INT NULL,
  Salary INT NOT NULL
);

INSERT INTO Employees (EmployeeID, ManagerID, Salary) VALUES
(1, NULL, 100000),
(2, 1, 80000),
(3, 1, 90000),
(4, 2, 70000),
(5, 2, 60000),
(6, 3, 85000),
(7, 3, 65000);

WITH RECURSIVE HierarchyCTE AS (
  SELECT EmployeeID, ManagerID, Salary, 1 AS Level
  FROM Employees
  WHERE ManagerID IS NULL
  
  UNION ALL
  
  SELECT e.EmployeeID, e.ManagerID, e.Salary, h.Level + 1
  FROM Employees e
  JOIN HierarchyCTE h ON e.ManagerID = h.EmployeeID
),
Aggregated AS (
  SELECT
    ManagerID,
    COUNT(EmployeeID) AS TeamSize,
    AVG(Salary) AS AvgSalary
  FROM HierarchyCTE
  GROUP BY ManagerID
),
Ranked AS (
  SELECT
    h.EmployeeID,
    h.ManagerID,
    h.Level,
    a.TeamSize,
    a.AvgSalary,
    RANK() OVER (PARTITION BY h.ManagerID ORDER BY a.TeamSize DESC) AS TeamRank
  FROM HierarchyCTE h
  LEFT JOIN Aggregated a ON h.EmployeeID = a.ManagerID
)
SELECT * FROM Ranked
ORDER BY Level, TeamRank, EmployeeID;

Expected output

EmployeeID | ManagerID | Level | TeamSize | AvgSalary | TeamRank -----------|-----------|-------|----------|-----------|--------- 1 | NULL | 1 | 2 | 73333.33 | 1 2 | 1 | 2 | 2 | 65000 | 1 3 | 1 | 2 | 2 | 75000 | 1 4 | 2 | 3 | NULL | NULL | 1 5 | 2 | 3 | NULL | NULL | 1 6 | 3 | 3 | NULL | NULL | 1 7 | 3 | 3 | NULL | NULL | 1

Core concepts

Recursive CTEAggregationWindow FunctionsHierarchical Queries

Challenge a Friend

Send this duel to someone else and see if they can solve it.