| Introduction Most users at one time or another have 
                          dealt with hierarchical data in a SQL database and no 
                          doubt learned that the management of hierarchical data 
                          is not what a relational database is intended for. The 
                          tables of a relational database are not hierarchical 
                          (like XML), but are simply a flat list. Hierarchical 
                          data has a parent-child relationship that is not naturally 
                          represented in a relational database table. For our purposes, hierarchical data 
                          is a collection of data where each item has a single 
                          parent and zero or more children (with the exception 
                          of the root item, which has no parent). Hierarchical 
                          data can be found in a variety of database applications, 
                          including forum and mailing list threads, business organization 
                          charts, content management categories, and product categories. 
                          For our purposes we will use the following product category 
                          hierarchy from an fictional electronics store: 
 These categories form a hierarchy in 
                          much the same way as the other examples cited above. 
                          In this article we will examine two models for dealing 
                          with hierarchical data in MySQL, starting with the traditional 
                          adjacency list model. The Adjacency List Model Typically the example categories shown 
                          above will be stored in a table like the following (I'm 
                          including full CREATE and INSERT statements so you can 
                          follow along): CREATE TABLE category( category_id INT AUTO_INCREMENT PRIMARY 
                          KEY, name VARCHAR(20) NOT NULL, parent INT DEFAULT NULL); INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2), (4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE 
                          ELECTRONICS',1), (7,'MP3 PLAYERS',6),(8,'FLASH',7), (9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6); SELECT * FROM category ORDER BY category_id; 
 10 rows in set (0.00 sec) In the adjacency list model, each item 
                          in the table contains a pointer to its parent. The topmost 
                          element, in this case electronics, has a NULL value 
                          for its parent. The adjacency list model has the advantage 
                          of being quite simple, it is easy to see that FLASH 
                          is a child of mp3 players, which is a child of portable 
                          electronics, which is a child of electronics. While 
                          the adjacency list model can be dealt with fairly easily 
                          in client-side code, working with the model can be more 
                          problematic in pure SQL. Retrieving a Full Tree The first common task when dealing with 
                          hierarchical data is the display of the entire tree, 
                          usually with some form of indentation. The most common 
                          way of doing this is in pure SQL is through the use 
                          of a self-join: SELECT t1.name AS lev1, t2.name as lev2, 
                          t3.name as lev3, t4.name as lev4 FROM category AS t1 LEFT JOIN category AS t2 ON t2.parent 
                          = t1.category_id LEFT JOIN category AS t3 ON t3.parent 
                          = t2.category_id LEFT JOIN category AS t4 ON t4.parent 
                          = t3.category_id WHERE t1.name = 'ELECTRONICS'; 
 6 rows in set (0.00 sec) Finding all the Leaf Nodes We can find all the leaf nodes in our 
                          tree (those with no children) by using a LEFT JOIN query: SELECT t1.name FROM category AS t1 LEFT JOIN category as 
                          t2 ON t1.category_id = t2.parent WHERE t2.category_id IS NULL; 
 Retrieving a Single Path The self-join also allows us to see 
                          the full path through our hierarchies: SELECT t1.name AS lev1, t2.name as lev2, t3.name as 
                          lev3, t4.name as lev4 FROM category AS t1 LEFT JOIN category AS t2 ON t2.parent = t1.category_id LEFT JOIN category AS t3 ON t3.parent = t2.category_id LEFT JOIN category AS t4 ON t4.parent = t3.category_id WHERE t1.name = 'ELECTRONICS' AND t4.name = 'FLASH'; 
 1 row in set (0.01 sec) The main limitation of such an approach is that you 
                          need one self-join for every level in the hierarchy, 
                          and performance will naturally degrade with each level 
                          added as the joining grows in complexity. Limitations of the Adjacency List Model Working with the adjacency list model in pure SQL can 
                          be difficult at best. Before being able to see the full 
                          path of a category we have to know the level at which 
                          it resides. In addition, special care must be taken 
                          when deleting nodes because of the potential for orphaning 
                          an entire sub-tree in the process (delete the portable 
                          electronics category and all of its children are orphaned). 
                          Some of these limitations can be addressed through the 
                          use of client-side code or stored procedures. With a 
                          procedural language we can start at the bottom of the 
                          tree and iterate upwards to return the full tree or 
                          a single path. We can also use procedural programming 
                          to delete nodes without orphaning entire sub-trees by 
                          promoting one child element and re-ordering the remaining 
                          children to point to the new parent. The Nested Set Model What I would like to focus on in this article is a 
                          different approach, commonly referred to as the Nested 
                          Set Model. In the Nested Set Model, we can look at our 
                          hierarchy in a new way, not as nodes and lines, but 
                          as nested containers. Try picturing our electronics 
                          categories this way: 
 Notice how our hierarchy is still maintained, as parent 
                          categories envelop their children.We represent this 
                          form of hierarchy in a table through the use of left 
                          and right values to represent the nesting of our nodes: CREATE TABLE nested_category ( category_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(20) NOT NULL, lft INT NOT NULL,  rgt INT NOT NULL ); INSERT INTO nested_category VALUES(1,'ELECTRONICS',1,20),(2,'TELEVISIONS',2,9),(3,'TUBE',3,4), (4,'LCD',5,6),(5,'PLASMA',7,8),(6,'PORTABLE ELECTRONICS',10,19), (7,'MP3 PLAYERS',11,14),(8,'FLASH',12,13), (9,'CD PLAYERS',15,16),(10,'2 WAY RADIOS',17,18); SELECT * FROM nested_category ORDER BY category_id; 
 We use lft and rgt because left and 
                          right are reserved words in MySQL, see http://dev.mysql.com/doc/mysql/en/reserved-words.html 
                          for the full list of reserved words. So how do we determine left and right values? We start 
                          numbering at the leftmost side of the outer node and 
                          continue to the right: 
 This design can be applied to a typical tree as well: 
 When working with a tree, we work from left to right, 
                          one layer at a time, descending to each node's children 
                          before assigning a right-hand number and moving on to 
                          the right. This approach is called the modified preorder 
                          tree traversal algorithm. Retrieving a Full Tree We can retrieve the full tree through the use of a 
                          self-join that links parents with nodes on the basis 
                          that a node's lft value will always appear between its 
                          parent's lft and rgt values: SELECT node.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND parent.name = 'ELECTRONICS' ORDER BY node.lft; 
 Unlike our previous examples with the adjacency list 
                          model, this query will work regardless of the depth 
                          of the tree. We do not concern ourselves with the rgt 
                          value of the node in our BETWEEN clause because the 
                          rgt value will always fall within the same parent as 
                          the lft values. Finding all the Leaf Nodes Finding all leaf nodes in the nested set model even 
                          simpler than the LEFT JOIN method used in the adjacency 
                          list model. If you look at the nested_category table, 
                          you may notice that the lft and rgt values for leaf 
                          nodes are consecutive numbers. To find the leaf nodes, 
                          we look for nodes where rgt = lft + 1: SELECT name FROM nested_category WHERE rgt = lft + 1; 
 Retrieving a Single Path With the nested set model, we can retrieve a single 
                          path without having multiple self-joins: SELECT parent.name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.name = 'FLASH' ORDER BY parent.lft; 
 Finding the Depth of the Nodes We have already looked at how to show the entire tree, 
                          but what if we want to also show the depth of each node 
                          in the tree, to better identify how each node fits in 
                          the hierarchy? This can be done by adding a COUNT function 
                          and a GROUP BY clause to our existing query for showing 
                          the entire tree: SELECT node.name, (COUNT(parent.name) - 1) AS depth FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 We can use the depth value to indent our category names 
                          with the CONCAT and REPEAT string functions: SELECT CONCAT( REPEAT(' ', COUNT(parent.name) - 1), 
                          node.name) AS name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 Of course, in a client-side application you will be 
                          more likely to use the depth value directly to display 
                          your hierarchy. Web developers could loop through the 
                          tree, adding <li></li> and <ul></ul> 
                          tags as the depth number increases and decreases. Depth of a Sub-Tree When we need depth information for a sub-tree, we cannot 
                          limit either the node or parent tables in our self-join 
                          because it will corrupt our results. Instead, we add 
                          a third self-join, along with a sub-query to determine 
                          the depth that will be the new starting point for our 
                          sub-tree: 
 This function can be used with any node name, including 
                          the root node. The depth values are always relative 
                          to the named node. Find the Immediate Subordinates of 
                          a Node Imagine you are showing a category of electronics products 
                          on a retailer web site. When a user clicks on a category, 
                          you would want to show the products of that category, 
                          as well as list its immediate sub-categories, but not 
                          the entire tree of categories beneath it. For this, 
                          we need to show the node and its immediate sub-nodes, 
                          but no further down the tree. For example, when showing 
                          the PORTABLE ELECTRONICS category, we will want to show 
                          MP3 PLAYERS, CD PLAYERS, and 2 WAY RADIOS, but not FLASH. This can be easily accomplished by adding a HAVING 
                          clause to our previous query: 
 If you do not wish to show the parent node, change 
                          the HAVING depth <= 1 line to HAVING depth = 1. Aggregate Functions in a Nested SetLet's add a table of products that we can use to demonstrate 
                          aggregate functions with:
 CREATE TABLE product( product_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(40), category_id INT NOT NULL ); INSERT INTO product(name, category_id) VALUES('20" 
                          TV',3),('36" TV',3), ('Super-LCD 42"',4),('Ultra-Plasma 62"',5),('Value 
                          Plasma 38"',5), ('Power-MP3 5gb',7),('Super-Player 1gb',8),('Porta 
                          CD',9),('CD To go!',9), ('Family Talk 360',10); SELECT * FROM product; 
 Now let's produce a query that can retrieve our category 
                          tree, along with a product count for each category: SELECT parent.name, COUNT(product.name) FROM nested_category AS node , nested_category AS parent, product WHERE node.lft BETWEEN parent.lft AND parent.rgt AND node.category_id = product.category_id GROUP BY parent.name ORDER BY node.lft; 
 This is our typical whole tree query with a COUNT and 
                          GROUP BY added, along with a reference to the product 
                          table and a join between the node and product table 
                          in the WHERE clause. As you can see, there is a count 
                          for each category and the count of subcategories is 
                          reflected in the parent categories. Adding New Nodes Now that we have learned how to query our tree, we 
                          should take a look at how to update our tree by adding 
                          a new node. Let's look at our nested set diagram again: 
 If we wanted to add a new node between the TELEVISIONS 
                          and PORTABLE ELECTRONICS nodes, the new node would have 
                          lft and rgt values of 10 and 11, and all nodes to its 
                          right would have their lft and rgt values increased 
                          by two. We would then add the new node with the appropriate 
                          lft and rgt values. While this can be done with a stored 
                          procedure in MySQL 5, I will assume for the moment that 
                          most readers are using 4.1, as it is the latest stable 
                          version, and I will isolate my queries with a LOCK TABLES 
                          statement instead: LOCK TABLE nested_category WRITE; SELECT @myRight := rgt FROM nested_category WHERE name = 'TELEVISIONS'; UPDATE nested_category SET rgt = rgt + 2 WHERE rgt 
                          > @myRight; UPDATE nested_category SET lft = lft + 2 WHERE lft 
                          > @myRight; INSERT INTO nested_category(name, lft, rgt) VALUES('GAME 
                          CONSOLES', @myRight + 1, @myRight + 2); UNLOCK TABLES; We can then check our nesting with our indented tree 
                          query: SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) 
                          ), node.name) AS name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 If we instead want to add a node as a child of a node 
                          that has no existing children, we need to modify our 
                          procedure slightly. Let's add a new FRS node below the 
                          2 WAY RADIOS node: LOCK TABLE nested_category WRITE; SELECT @myLeft := lft FROM nested_category WHERE name = '2 WAY RADIOS'; UPDATE nested_category SET rgt = rgt + 2 WHERE rgt 
                          > @myLeft; UPDATE nested_category SET lft = lft + 2 WHERE lft 
                          > @myLeft; INSERT INTO nested_category(name, lft, rgt) VALUES('FRS', 
                          @myLeft + 1, @myLeft + 2); UNLOCK TABLES; In this example we expand everything to the right of 
                          the left-hand number of our proud new parent node, then 
                          place the node to the right of the left-hand value. 
                          As you can see, our new node is now properly nested: SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) 
                          ), node.name) AS name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 Deleting Nodes The last basic task involved in working with nested 
                          sets is the removal of nodes. The course of action you 
                          take when deleting a node depends on the node's position 
                          in the hierarchy; deleting leaf nodes is easier than 
                          deleting nodes with children because we have to handle 
                          the orphaned nodes. When deleting a leaf node, the process if just the 
                          opposite of adding a new node, we delete the node and 
                          its width from every node to its right: LOCK TABLE nested_category WRITE; SELECT @myLeft := lft, @myRight := rgt, @myWidth := 
                          rgt - lft + 1 FROM nested_category WHERE name = 'GAME CONSOLES'; DELETE FROM nested_category WHERE lft BETWEEN @myLeft 
                          AND @myRight; UPDATE nested_category SET rgt = rgt - @myWidth WHERE 
                          rgt > @myRight; UPDATE nested_category SET lft = lft - @myWidth WHERE 
                          lft > @myRight; UNLOCK TABLES; And once again, we execute our indented tree query 
                          to confirm that our node has been deleted without corrupting 
                          the hierarchy: SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) 
                          ), node.name) AS name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 This approach works equally well to delete a node and 
                          all its children: LOCK TABLE nested_category WRITE; SELECT @myLeft := lft, @myRight := rgt, @myWidth := 
                          rgt - lft + 1 FROM nested_category WHERE name = 'MP3 PLAYERS'; DELETE FROM nested_category WHERE lft BETWEEN @myLeft 
                          AND @myRight; UPDATE nested_category SET rgt = rgt - @myWidth WHERE 
                          rgt > @myRight; UPDATE nested_category SET lft = lft - @myWidth WHERE 
                          lft > @myRight; UNLOCK TABLES; And once again, we query to see that we have successfully 
                          deleted an entire sub-tree: SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) 
                          ), node.name) AS name FROM nested_category AS node, nested_category AS parent WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 The other scenario we have to deal with is the deletion 
                          of a parent node but not the children. In some cases 
                          you may wish to just change the name to a placeholder 
                          until a replacement is presented, such as when a supervisor 
                          is fired. In other cases, the child nodes should all 
                          be moved up to the level of the deleted parent: LOCK TABLE nested_category WRITE; SELECT @myLeft := lft, @myRight := rgt, @myWidth := 
                          rgt - lft + 1 FROM nested_category WHERE name = 'PORTABLE ELECTRONICS'; DELETE FROM nested_category WHERE lft = @myLeft; UPDATE nested_category SET rgt = rgt - 1, lft = lft 
                          - 1 WHERE lft BETWEEN @myLeft AND @myRight; UPDATE nested_category SET rgt = rgt - 2 WHERE rgt 
                          > @myRight; UPDATE nested_category SET lft = lft - 2 WHERE lft 
                          > @myRight; UNLOCK TABLES; In this case we subtract two from all elements to the 
                          right of the node (since without children it would have 
                          a width of two), and one from the nodes that are its 
                          children (to close the gap created by the loss of the 
                          parent's left value). Once again, we can confirm our 
                          elements have been promoted: SELECT CONCAT( REPEAT( ' ', (COUNT(parent.name) - 1) 
                          ), node.name) AS name FROM nested_category AS node, nested_category AS parent
 WHERE node.lft BETWEEN parent.lft AND parent.rgt GROUP BY node.name ORDER BY node.lft; 
 Other scenarios when deleting nodes would include promoting 
                          one of the children to the parent position and moving 
                          the child nodes under a sibling of the parent node, 
                          but for the sake of space these scenarios will not be 
                          covered in this article. Final Thoughts While I hope the information within this article will 
                          be of use to you, the concept of nested sets in SQL 
                          has been around for over a decade, and there is a lot 
                          of additional information available in books and on 
                          the Internet. In my opinion the most comprehensive source 
                          of information on managing hierarchical information 
                          is a book called Joe Celko's Trees and Hierarchies in 
                          SQL for Smarties, written by a very respected author 
                          in the field of advanced SQL, Joe Celko. Joe Celko is 
                          often credited with the nested sets model and is by 
                          far the most prolific author on the subject. I have 
                          found Celko's book to be an invaluable resource in my 
                          own studies and highly recommend it. The book covers 
                          advanced topics which I have not covered in this article, 
                          and provides additional methods for managing hierarchical 
                          data in addition to the Adjacency List and Nested Set 
                          models. In the References / Resources section that follows 
                          I have listed some web resources that may be of use 
                          in your research of managing hierarchal data, including 
                          a pair of PHP related resources that include pre-built 
                          PHP libraries for handling nested sets in MySQL. Those 
                          of you who currently use the adjacency list model and 
                          would like to experiment with the nested set model will 
                          find sample code for converting between the two in the 
                          Storing Hierarchical Data in a Database resource listed 
                          below. |