A New Hierarchical Datatype - SQL Server 2008

Hierarchical data may be in organizational charts, file and folder lists, product catalogs, forum threads, and so on. Hierarchical data is defined as a set of data items that are related or ranked against each other. In simplest terms, a parent-child relationship is a form of hierarchy. The parent can have zero or more children, those children can have zero or more children, and so on.

Within SQL Server, you can handle hierarchical information in a few different ways:

  • Use the XML datatype. This requires all of the applications and users that interact with the data to use XML. However, XML is an industry standard and is great for interacting with heterogeneous environments.
  • Store the data in a relational table and use SELF JOIN statements to dynamically create the structure. This can lead to potentially complex T-SQL statements, which are difficult to manage and maintain.
  • Use the new hierarchyid datatype available in SQL Server 2008.

The hierarchyid datatype is great for applications that deal with a lot of hierarchical data. It allows you to perform relational operations with hierarchical data. Many built-in functions support the new hierarchyid datatype.

The hierarchyid datatype is not a native datatype like DATE and TIME. Rather, it is a systemdefined UDT. This new type is stored in Microsoft.SqlServer.Types.dll and is created within the Microsoft.SqlServer.Types namespace. System-defined implies that you cannot delete, rename, or otherwise mess up this type.

Given that hierarchyid is technically a CLR UDT, you might expect that the CLR must be enabled to use it with SQL Server. However, since hierarchyid is a system-defined type, you do not need to explicitly enable the CLR within SQL Server to use it.

Let’s consider the organizational structure of a fictional company called The Bureaucratic Corporation, represented in Figure.

Organizational chart for The Bureaucratic Corporation

Organizational chart for The Bureaucratic Corporation

Suppose that you are tasked with storing this organizational chart in SQL Server using the new hierarchyid type. After creating the database, you create the Employees table, as follows:

CREATE TABLE Employees
(node hierarchyid PRIMARY KEY CLUSTERED,
level AS node.GetLevel() PERSISTED,
employee_id INT UNIQUE,
employee_name VARCHAR(30) NOT NULL)

Notice that you created a computed column using the deterministic function GetLevel(). This function returns the level in the hierarchy of the node value. In the example, the root node, Rob, is considered level 0. Steve and Bill are level 1, and following this down the tree, Alice and Barbara are considered level 3.

Once the table is created, the root node can be added as follows:

INSERT INTO Employees VALUES (hierarchyid::GetRoot(),5000,'Rob')

GetRoot() is another deterministic function that returns the root value of the node. Since you will be referencing the employees by employee_id, you’ll use employee_id numbers that start with 5000, so as to not be confused with the node levels, which will be integers starting with 0.

Once you have the root node in place, you can continue and build the organizational structure. First, a disclaimer: as with any programmatic implementation, there are different ways to arrive at the same result. The examples here are oversimplistic and are not optimized by any stretch of the imagination. They are merely intended to clearly demonstrate the core concepts.

To add a subordinate to the root node, Rob, you will use the GetDescendants built-in function. This function returns a child node that is a descendant of the parent. In this case, there are no children of this parent, so this function will return the value of the first child.

--First let's declare some variables for use in temporarily storing the nodes

DECLARE @ManagerNode hierarchyid
DECLARE @Level hierarchyid
--Employee_ID of 5000 is the root node, "Rob"
SELECT @ManagerNode=node FROM Employees WHERE employee_id=5000
--GetDescendant will now return the first child "/1"
--since there are no other children defined INSERT INTO Employees VALUES (@ManagerNode.GetDescendant (NULL, NULL),5001, 'Bill')

In the example, the root node, Rob, is considered level 0 and is represented in a text form as /. When Bill was added, he was the first child of the root node and was given the value /1. Similarly, when you add Steve, another child node, his value will be /2.

--Employee_ID of 5001 is Bill
SELECT @Level=node FROM Employees WHERE employee_id=5001
--We are passing (Bill,NULL) to GetDescendant. This will
--give us the node value of the next child after Bill
--Likewise if we passed (NULL,Bill) GetDescendant would
--return the node value of the child previous to Bill
--(think inserting a child before Bill)
INSERT INTO Employees VALUES (@ManagerNode.GetDescendant(@Level, NULL),5002, 'Steve')

It is important to note the textual representation of the node itself. The following query will display the organizational structure at this point:

SELECT node.ToString() AS NodeAsString,
node as NodeAsBinary,
node.GetLevel() AS Level,
employee_id,
employee_name
FROM Employees

query will display the organizational structure at this point

As you build the rest of the organizational structure, the string representation as it relates to the organization is shown graphically in Figure.

Organizational chart for The Bureaucratic Corporation showing node names

Organizational chart for The Bureaucratic Corporation showing node names

You can see from Figure that, as you add a direct report, Jim under Steve, Jim’s node is displayed as /2/1. The rest of the code is as follows:

SELECT @Level=node FROM Employees WHERE employee_id=5002
INSERT INTO Employees VALUES (@Level.GetDescendant (NULL, NULL),5003, 'Jim')
--Now we have Steve, the manager stored in @Level
--We inserted Jim as a direct report and now need
--another variable to store Jim since we are going
--to insert another direct report (child) under Steve, the manager.
DECLARE @child1 hierarchyid
--Employee_ID of 5003 is Jim
SELECT @child1=node FROM Employees WHERE employee_id=5003
--GetDescendant will return the value of the next node after Jim
INSERT INTO Employees VALUES (@Level. GetDescendant(@child1, NULL),5004, 'Kevin')
--Now we are repeating the pattern, grab Bill's node in @Level
SELECT @Level=node FROM Employees WHERE employee_id=5001
--The first child of Bill is NULL,NULL, we want to put Joseph there
INSERT INTO Employees VALUES (@Level.GetDescendant(NULL,NULL) ,5005, 'Joseph')
--Grab Joseph's node
SELECT @child1=node FROM Employees WHERE employee_id=5005
--Insert Joan as a direct report of Bill
INSERT INTO Employees VALUES (@Level.GetDescendant (@child1,NULL),5006, 'Joan')

--Alice is a direct report of Joan. Using the same pattern grab Joan's node

SELECT @Level=node FROM Employees WHERE employee_id=5006
--Put Alice as the first direct report (child) of Joan
INSERT INTO Employees VALUES (@Level.GetDescendant (NULL,NULL),5007, 'Alice')
--Grab Alice's node
SELECT @child1=node FROM Employees WHERE employee_id=5007
--Insert Barabara as a direct report of Joan
INSERT INTO Employees VALUES (@Level.GetDescendant (@child1,NULL),5008, 'Barbara')

Now running the organization query results in the following:
organization query results in the following

Taking a look at this organization, you can see that it’s too flat to make a good bureaucratic company. You need more layers of management, so let’s use the built-in Reparent() function to move Joan’s organization to report under Kevin.

DECLARE @level hierarchyid
DECLARE @OldParent hierarchyid
DECLARE @NewParent hierarchyid
SELECT @Level=node from Employees where employee_id=5006 -- Joan
SELECT @OldParent=node from Employees where employee_id=5001 -- Move from Bill
SELECT @NewParent=node from Employees where employee_id=5004 -- To Kevin
UPDATE Employees
SET node = @Level.Reparent(@OldParent, @NewParent)
WHERE node = @Level

When you requery the organization list, you see the following:

requery the organization list

Can you spot a problem?

When you issued the Reparent() function, it worked, but not as some of us might have thought. The Reparent() function literally took Joan’s node and placed it under Kevin’s, but left all of Joan’s direct reports without a manager (that is, Joan was /1/2, but this no longer exists). This behavior is an important concept when using hierarchyid. It is up to the developer to handle this behavior, if it is actually a problem in your situation.

When using hierarchyid, you can create indexes on top of the type. Two kinds of indexes may be useful to the sample application. A depth-first index will store all the nodes together and will be useful in reporting all employees reporting to a specific manager. A breadth-first index stores all the level nodes together, so queries reporting all direct reports of a specific manager will benefit from this type of index.

When you created the table, you created a depth-first index on the node column because it was created with a primary key constraint. To create the breadth-first index, you can issue the command on the node and level columns together, similar to this type of syntax:

create index Breadth on Employees(node,level)

The hierarchyid datatype allows for easy storage and management of hierarchical data. Not only are these functions available through T-SQL scripts, but they can be accessed in CLR stored procedures and functions as well.


All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics