Mastering Recursive Common Table Expressions (CTEs) for Complex Hierarchical Data

Learn how to use recursive Common Table Expressions (CTEs) in SQL to efficiently query and manage complex hierarchical data with this beginner-friendly tutorial.

Hierarchical data, like organizational charts or folder structures, can be challenging to query using standard SQL. Recursive Common Table Expressions (CTEs) provide a powerful and elegant solution to traverse these hierarchies efficiently. In this tutorial, we'll explore how recursive CTEs work and how to use them step-by-step.

A Common Table Expression (CTE) is a temporary result set you can reference within a SQL statement. When a CTE refers to itself (recursive CTE), it allows you to perform iterative processing, which is perfect for hierarchical data.

Let's consider a simple example: an employee table where each employee may have a manager. Our goal is to find the entire management hierarchy for a specific employee.

sql
CREATE TABLE employees (
  employee_id INT PRIMARY KEY,
  name VARCHAR(100),
  manager_id INT
);

INSERT INTO employees (employee_id, name, manager_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 2),
(4, 'Diana', 2),
(5, 'Eve', 1);

Here, Alice (employee_id = 1) is the top-level manager, Bob and Eve report to Alice, and Charlie and Diana report to Bob. To retrieve the full hierarchy starting from Alice, we use a recursive CTE.

sql
WITH RECURSIVE employee_hierarchy AS (
  -- Anchor member: starts with the top-level employee
  SELECT employee_id, name, manager_id, 1 AS level
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive member: join employees to the previous result to find subordinates
  SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
  FROM employees e
  INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT *
FROM employee_hierarchy
ORDER BY level, manager_id;

This query works in two parts: first it selects the anchor row (top-level employees with no managers). Then it recursively joins employees whose manager_id matches the employee_id found in the previous step. The 'level' column tracks the depth in the hierarchy.

You can also modify the recursive CTE to start from any employee (not necessarily the top) and explore their full reporting chain upwards or downwards.

For example, to find all managers of "Charlie" (employee_id = 3), you can write:

sql
WITH RECURSIVE managers AS (
  -- Anchor member: start from Charlie
  SELECT employee_id, name, manager_id
  FROM employees
  WHERE employee_id = 3

  UNION ALL

  -- Recursive member: find Charlie's managers
  SELECT e.employee_id, e.name, e.manager_id
  FROM employees e
  INNER JOIN managers m ON e.employee_id = m.manager_id
)
SELECT * FROM managers;

Recursive CTEs are supported in most modern SQL databases including PostgreSQL, SQL Server, SQLite, and MySQL 8.0+. They are highly useful for querying descendants, ancestors, and paths in a hierarchy.

In summary, mastering recursive CTEs enables you to handle complex hierarchical data with clarity and efficiency. Practice them with different hierarchical datasets to solidify your understanding.