Mastering Recursive Common Table Expressions (CTEs) for Hierarchical Data in SQL
Learn how to use recursive CTEs in SQL to efficiently query hierarchical data like organizational charts or folder structures, using simple and clear examples.
Hierarchical data, such as organizational charts, family trees, or directory structures, is common in many applications. Retrieving this data efficiently from a SQL database can be challenging without the right tools. Recursive Common Table Expressions (CTEs) offer a powerful way to query hierarchical data in a readable and elegant manner.
In this tutorial, we'll cover the basics of recursive CTEs, explain how they work, and demonstrate how to use them to traverse hierarchical data using a practical example.
### What Is a Recursive CTE?
A Common Table Expression (CTE) is a temporary result set you can reference within a SQL statement. Recursive CTEs are special because they reference themselves to repeatedly execute and build a result set. This makes them perfect for traversing hierarchical or tree-structured data.
### Example Scenario: Employee Hierarchy
Imagine you have an `employees` table with the following columns:
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100),
manager_id INT NULL
);
INSERT INTO employees (id, name, manager_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'Diana', 2),
(5, 'Eve', 2);In this structure, `manager_id` points to the employee's manager, forming a hierarchy. Alice is the top-level manager (no manager_id), Bob and Charlie report to Alice, and Diana and Eve report to Bob.
### Writing a Recursive CTE to Get the Full Hierarchy
We want to build a query that shows each employee and their management chain, starting from Alice.
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member: select top-level manager(s)
SELECT id, name, manager_id, 1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: get employees reporting to those in the previous level
SELECT e.id, e.name, e.manager_id, eh.level + 1
FROM employees e
INNER JOIN EmployeeHierarchy eh ON e.manager_id = eh.id
)
SELECT *
FROM EmployeeHierarchy
ORDER BY level, manager_id, id;### Explanation:
- The anchor member selects root nodes (employees with no manager). In this case, Alice. - The recursive member joins the employees table with the CTE itself to find employees reporting to the managers already found. - The recursion continues until all employees are listed. - We track a `level` column to indicate the depth in the hierarchy.
### Output:
The output will look like this: | id | name | manager_id | level | |----|---------|------------|-------| | 1 | Alice | NULL | 1 | | 2 | Bob | 1 | 2 | | 3 | Charlie | 1 | 2 | | 4 | Diana | 2 | 3 | | 5 | Eve | 2 | 3 |
### Tips for Using Recursive CTEs
- Always include an anchor member to start the recursion. - The recursive member should join back to the CTE to explore deeper levels. - Prevent infinite loops by ensuring the recursion eventually reaches a stopping point (e.g., no more matching rows). - Use `level` or similar tracking columns to keep track of depth or recursion levels. - Recursive CTEs are supported in many SQL databases such as PostgreSQL, SQL Server, MySQL 8.0+, and SQLite 3.8.3+.
### Conclusion
Recursive CTEs are a powerful feature in SQL that simplify working with hierarchical data. Once you understand the anchor and recursive parts, you can adapt this approach for organizational structures, bill of materials, folder trees, and more. Try experimenting with the example above on your own data to master recursive queries!