Team LiB   Previous Section   Next Section

8.4 Complex Hierarchy Operations

In this section, we discuss how we can use Oracle SQL's hierarchical extensions to perform complex hierarchical queries.

8.4.1 Finding the Number of Levels

Previously we showed how the LEVEL pseudocolumn generates a level number for each record when we use the START WITH...CONNECT BY clause. We can use the following query to determine the number of levels in the hierarchy by counting the number of distinct level numbers returned by the LEVEL pseudocolumn:

SELECT COUNT(DISTINCT LEVEL) 
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;

COUNT(DISTINCTLEVEL)
--------------------
                   4

To determine the number of employees at each level, group the results by LEVEL and count the number of employees in each distinct group. For example:

SELECT LEVEL, COUNT(EMP_ID) 
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID
GROUP BY LEVEL;

    LEVEL COUNT(EMP_ID)
--------- -------------
        1             1
        2             3
        3             8
        4             2

8.4.2 Listing Records in Hierarchical Order

One of the very common programming challenges SQL programmers face is to list records in a hierarchy in their proper hierarchical order. For example, we might wish to list employees with their subordinates underneath them, as is in the following query:

SELECT LEVEL, LPAD('  ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",
       EMP_ID, MANAGER_EMP_ID
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY PRIOR EMP_ID = MANAGER_EMP_ID;

    LEVEL Employee        EMP_ID MANAGER_EMP_ID 
--------- ------------ --------- -------------- 
        1 KING              7839                
        2   JONES           7566           7839 
        3     SCOTT         7788           7566 
        4       ADAMS       7876           7788 
        3     FORD          7902           7566 
        4       SMITH       7369           7902 
        2   BLAKE           7698           7839 
        3     ALLEN         7499           7698 
        3     WARD          7521           7698 
        3     MARTIN        7654           7698 
        3     TURNER        7844           7698 
        3     JAMES         7900           7698 
        2   CLARK           7782           7839 
        3     MILLER        7934           7782 

14 rows selected.

Notice that by using the expression LPAD(' ',2*(LEVEL - 1)), we are able to align employee names in a manner that corresponds to their level. As the level number increases, the number of spaces returned by the expression increases, and the employee name is further indented.

The previous query lists all the employees in the EMPLOYEE table. If we want to filter out certain employees based on some condition, then we can use a WHERE clause in our hierarchical query. Here is an example:

SELECT LEVEL, LPAD('    ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",
       EMP_ID, MANAGER_EMP_ID, SALARY
FROM EMPLOYEE
WHERE SALARY > 2000
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

    LEVEL Employee        EMP_ID MANAGER_EMP_ID     SALARY
--------- ------------ --------- --------------  ---------
        1 KING              7839                      5000
        3     SCOTT         7788           7566       3000
        3     FORD          7902           7566       3000
        2   BLAKE           7698           7839       2850
        2   CLARK           7782           7839       2450

This query lists records with salary > 2000. Notice that the WHERE clause restricts the rows returned by the query without affecting other rows in the hierarchy. In our example, the WHERE condition filtered JONES out of the result, but the employees below JONES in the hierarchy (SCOTT and FORD) are not filtered out, and are still indented as they were when JONES was present. The WHERE clause must come before the START WITH...CONNECT BY clause in a hierarchical query, otherwise it will result in a syntax error.

Instead of reporting out the whole organization chart, we may want to list only the subtree under a given employee, JONES for example. To do this, we can modify the START WITH condition so that it specifies JONES as the root of the query. For example:

SELECT LEVEL, LPAD('    ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",
       EMP_ID, MANAGER_EMP_ID, SALARY
FROM EMPLOYEE
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

    LEVEL Employee        EMP_ID MANAGER_EMP_ID     SALARY
--------- ------------ --------- --------------  ---------
        1 JONES             7566           7839       2000
        2   SCOTT           7788           7566       3000
        3     ADAMS         7876           7788       1100
        2   FORD            7902           7566       3000
        3     SMITH         7369           7902        800

Notice that since we asked the query to consider JONES as the root of the hierarchy, it assigned level 1 to JONES, level 2 to employees directly reporting to him, and so forth. Be careful while using conditions such as LNAME = 'JONES' in hierarchical queries. In this case, if we have two JONES in our organization, the result returned by the hierarchy may be wrong. It is better to use primary or unique key columns, such as EMP_ID, as the condition in such situations.

In this example, we listed the portion of the organization chart headed by a specific employee. There could be situations when we may need to print the organization chart headed by any employee that meets a specific condition. For example, we may want to list all employees under the employee who has been working in the company for the longest time. In this case, the starting point of the query (the root) is dependent on a condition. Therefore, we have to use a subquery to generate this information and pass it to the main query, as in the following example:

SELECT LEVEL, LPAD('    ',2*(LEVEL - 1)) || LNAME "EMPLOYEE",
       EMP_ID, MANAGER_EMP_ID, SALARY
FROM EMPLOYEE
START WITH HIRE_DATE = (SELECT MIN(HIRE_DATE) FROM EMPLOYEE)
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

LEVEL EMPLOYEE         EMP_ID MANAGER_EMP_ID     SALARY
----- ------------ ---------- -------------- ----------
    1   BLAKE            7698           7839       2850
    2   ALLEN            7499           7698       1600
    2   WARD             7521           7698       1250
    2   MARTIN           7654           7698       1250
    2   TURNER           7844           7698       1500
    2   JAMES            7900           7698        950

6 rows selected.

Note the START WITH clause in this example. The subquery in the START WITH clause returns the minimum HIRE_DATE in the table, which represents the HIRE_DATE of the oldest employee. The main query uses this information as the starting point of the hierarchy and lists the organization structure under this employee.

While using a subquery in the START WITH clause, be aware of how many rows will be returned by the subquery. If more than one row is returned when we are expecting just one row (indicated by the = sign), the query will generate an error. We can get around this by replacing = with the IN operator, but be warned that the hierarchical query may then end up dealing with multiple roots.

8.4.3 Checking for Ascendancy

Another common operation on hierarchical data is to check for ascendancy. In an organization chart, we may ask whether one employee has authority over another. For example: "Does JONES have any authority over BLAKE?" To find out, we need to search for BLAKE in the subtree headed by JONES. If we find BLAKE in the subtree, then we know that BLAKE either directly or indirectly reports to JONES. If we don't find BLAKE in the subtree, then we know that JONES doesn't have any authority over BLAKE. The following query searches for BLAKE in the subtree headed by JONES:

SELECT * 
FROM EMPLOYEE
WHERE LNAME = 'BLAKE'
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

no rows selected

The START WITH...CONNECT BY clause in this example generates the subtree headed by JONES, and the WHERE clause filters this subtree to find BLAKE. As we can see, no rows were returned. This means that BLAKE was not found in JONES' subtree, so we know that JONES has no authority over BLAKE. Let's take a look at another example that produces positive results. This time we'll check to see whether JONES has any authority over SMITH:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE
WHERE LNAME = 'SMITH'
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

    EMP_ID LNAME         DEPT_ID MANAGER_EMP_ID     SALARY HIRE_DATE 
---------- ---------- ---------- -------------- ---------- --------- 
      7369 SMITH              20           7902        800 17-DEC-80 

This time, SMITH was found in the list of employees in JONES' subtree, so we know that at some level JONES has management authority over SMITH.

8.4.4 Deleting a Subtree

Let's assume that the organization we are dealing with splits, and JONES and all his subordinates form a new company. Therefore, we don't need to maintain JONES and his subordinates in our EMPLOYEE table. Furthermore, we need to delete the entire subtree headed by JONES, as shown in Figure 8-1, from our table. We can do this by using a subquery as in the following example:

DELETE FROM EMPLOYEE
WHERE EMP_ID IN 
(SELECT EMP_ID FROM EMPLOYEE
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID);

5 rows deleted.

In this example, the subquery generates the subtree headed by JONES, and returns the EMP_IDs of the employees in that subtree, including JONES'. The outer query then deletes the records with these EMP_ID values from the EMPLOYEE table.

8.4.5 Listing Multiple Root Nodes

An interesting variation on the problem of listing the root node of a hierarchy is to find and list the root nodes from several hierarchies that are all stored in the same table. For example, we might consider department manager's to represent root nodes, and we might further wish to list all department managers found in the EMPLOYEE table.

There are no constraints on the employees belonging to any department. However, we can assume that if A reports to B and B reports to C, and A and C belong to the same department, then B also belongs to the same department. If an employee's manager belongs to another department, then that employee is the uppermost employee, or manager, of his department.

Therefore, to find the uppermost employee in each department, we need to search the tree for those employees whose managers belong to a different department then their own. We do that using the following query:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID
AND DEPT_ID != PRIOR DEPT_ID;

EMP_ID LNAME     DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE 
------ -------- -------- -------------- ------ --------- 
  7839 KING           10                  5000 17-NOV-81 
  7566 JONES          20           7839   2975 02-APR-81 
  7698 BLAKE          30           7839   2850 01-MAY-81 

In this example, the extra condition (DEPT_ID != PRIOR DEPT_ID) added to the CONNECT BY clause restricts the output to only those employees whose managers belong to a different department then their own.

8.4.6 Listing the Top Few Levels of a Hierarchy

Another common task in dealing with hierarchical data is listing the top few levels of a hierarchy tree. For example, we may want to list top management employees in an organization. Let's assume that the top two levels in our organization chart constitute top management. We can then use the LEVEL pseudocolumn to identify those employees, as in the following example:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE
WHERE LEVEL <= 2 
START WITH MANAGER_EMP_ID IS NULL
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID SALARY HIRE_DATE 
------ --------- ---------- -------------- ------ --------- 
  7839 KING              10                  5000 17-NOV-81 
  7566 JONES             20           7839   2975 02-APR-81 
  7698 BLAKE             30           7839   2850 01-MAY-81 
  7782 CLARK             10           7839   2450 09-JUN-81 

In this example, the LEVEL <= 2 condition in the WHERE clause restricts the results to only those employees in the top two levels of the organization chart.

8.4.7 Aggregating a Hierarchy

Another challenging requirement on hierarchical data is to aggregate a hierarchy. For example, we may want to sum the salaries of all employees reporting to a specific employee. Or, we may want to consider each employee as a root, and for each employee report out the sum of the salaries of all subordinate employees.

The first problem is relatively simple. Earlier we described how to select a subtree headed by an employee. We can easily sum the salaries of all employees in such a subtree. For example:

SELECT SUM(SALARY) 
FROM EMPLOYEE
START WITH LNAME = 'JONES'
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID;

SUM(SALARY)
-----------
      10875

The START WITH LNAME = 'JONES' clause generates the subtree headed by JONES, and the SUM(SALARY) expression sums the salary of employees in this subtree.

The second problem, a seemingly simple extension of the first, is relatively complex. We want to consider each employee as a root, and for each employee we want to sum the salaries of all employees in its subtree. In essence, we want to repeat the previous query for each employee in the table. The following SQL uses an inline view to achieve this:

SELECT LNAME, SALARY,
(SELECT SUM(SALARY) FROM EMPLOYEE T1
START WITH LNAME = T2.LNAME
CONNECT BY MANAGER_EMP_ID = PRIOR EMP_ID) SUM_SALARY
FROM EMPLOYEE T2;

LNAME                    SALARY SUM_SALARY
-------------------- ---------- ----------
SMITH                       800        800
ALLEN                      1600       1600
WARD                       1250       1250
JONES                      2975      10875
MARTIN                     1250       1250
BLAKE                      2850       9400
CLARK                      2450       3750
SCOTT                      3000       4100
KING                       5000      29025
TURNER                     1500       1500
ADAMS                      1100       1100
JAMES                       950        950
FORD                       3000       3800
MILLER                     1300       1300

14 rows selected.

In this example, the START WITH...CONNECT BY clause in the inline view generates a subtree for each employee. The inline view executes once for every row in the outer EMPLOYEE employee. For each row in the outer EMPLOYEE table, the inline view generates a subtree headed by this employee, and returns the sum of salaries for all the employees in this subtree to the main query.

The result set displays two numbers for each employee. The first number, SALARY, is the employee's own salary. The second number, SUM_SALARY, is the sum of the salaries of all employees under him (including himself). Often programmers resort to PL/SQL to solve this type of problem. However, this query, which combines the power of hierarchical queries with that of inline views, solves this problem in a much more concise and elegant way.

    Team LiB   Previous Section   Next Section