Team LiB   Previous Section   Next Section

8.2 Simple Hierarchy Operations

The processes for extracting some types of information from a table storing hierarchical data are relatively simple, and can be performed using the techniques that we have discussed so far in this book. Extracting more complex information requires using some new SQL constructs, which we'll discuss in the later section titled Section 8.3. In this section, we'll discuss the hierarchy operations that can be performed using what we've learned so far.

8.2.1 Finding the Root Node

Finding the root of a hierarchy tree is easy; we look for the one node with no parent. In the EMPLOYEE table we discussed earlier, the value for MANAGER_EMP_ID is NULL for the uppermost employee, and only for the uppermost employee. The following query searches for cases where MANAGER_EMP_ID is NULL, thereby returning the root node:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE 
WHERE MANAGER_EMP_ID IS NULL;

   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 
--------- ---------- --------- -------------- --------- --------- 
     7839 KING              10                     5000 17-NOV-81 

Because the MANAGER_EMP_ID column defines the hierarchy, it's important that it always contain correct data. While populating data in this table, we must make sure to specify a MANAGER_EMP_ID for every row other than the row for the uppermost employee. The uppermost employee doesn't report to anyone (doesn't have a manager), and hence MANAGER_EMP_ID is not applicable for him. If we leave out MANAGER_EMP_ID values for employees that do have managers, those employees will erroneously show up as root nodes.

8.2.2 Finding a Node's Immediate Parent

We may wish to link nodes to their immediate parents. For example, we might want to print a report showing each employee's manager. The name of each employee's manager can be derived by joining the EMPLOYEE table to itself. This type of join is a self join (discussed in Chapter 3). The following query returns the desired result:

SELECT E.LNAME "Employee", M.LNAME "Manager"
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID;

Employee   Manager
---------- ----------
SMITH      FORD
ALLEN      BLAKE
WARD       BLAKE
JONES      KING
MARTIN     BLAKE
BLAKE      KING
CLARK      KING
SCOTT      JONES
TURNER     BLAKE
ADAMS      SCOTT
JAMES      BLAKE
FORD       JONES
MILLER     CLARK

13 rows selected.

Note this query results in only 13 rows, although the EMPLOYEE table has 14 rows.

SELECT COUNT(*) FROM EMPLOYEE;

  COUNT(*)
----------
        14

The reason that only 13 rows are returned from the self join is simple. This query lists employees and their managers. But since the uppermost employee KING doesn't have any manager, that row is not produced in the output. If we want all the employees to be produced in the result, we need an outer join, as in the following example:

SELECT E.LNAME "Employee", M.LNAME "Manager"
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID (+);

Employee             Manager
-------------------- --------------------
SMITH                FORD
ALLEN                BLAKE
WARD                 BLAKE
JONES                KING
MARTIN               BLAKE
BLAKE                KING
CLARK                KING
SCOTT                JONES
KING
TURNER               BLAKE
ADAMS                SCOTT
JAMES                BLAKE
FORD                 JONES
MILLER               CLARK

14 rows selected.

Outer joins are discussed in detail in Chapter 3.

8.2.3 Finding Leaf Nodes

The opposite problem from finding the root node, which has no parent, is to find leaf nodes, which have no children. Employees who do not manage anyone are the leaf nodes in the hierarchy tree shown in Figure 8-1. At first glance, the following query seems like it should list all employees from the EMPLOYEE table who are not managers of any other employee:

SELECT * FROM EMPLOYEE
WHERE EMP_ID NOT IN (SELECT MANAGER_EMP_ID FROM EMPLOYEE);

However, when we execute this statement, we will see "No rows selected." Why? It is because the MANAGER_EMP_ID column contains a NULL value in one row (for the uppermost employee), and NULLs can't be compared to any data value. Therefore, to get the employees who don't manage anyone, we need to rewrite the query as follows:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE E
WHERE EMP_ID NOT IN 
(SELECT MANAGER_EMP_ID FROM EMPLOYEE 
WHERE MANAGER_EMP_ID IS NOT NULL);

   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 
--------- ---------- --------- -------------- --------- --------- 
     7369 SMITH             20           7902       800 17-DEC-80 
     7499 ALLEN             30           7698      1600 20-FEB-81 
     7521 WARD              30           7698      1250 22-FEB-81 
     7654 MARTIN            30           7698      1250 28-SEP-81 
     7844 TURNER            30           7698      1500 08-SEP-81 
     7876 ADAMS             20           7788      1100 23-MAY-87 
     7900 JAMES             30           7698       950 03-DEC-81 
     7934 MILLER            10           7782      1300 23-JAN-82 

8 rows selected.

In this example, the subquery returns the EMP_IDs of all the managers. The outer query then returns all the employees, except the ones returned by the subquery. This query can also be written as a correlated subquery using EXISTS instead of IN:

SELECT EMP_ID, LNAME, DEPT_ID, MANAGER_EMP_ID, SALARY, HIRE_DATE 
FROM EMPLOYEE E
WHERE NOT EXISTS 
(SELECT EMP_ID FROM EMPLOYEE E1 WHERE E.EMP_ID = E1.MANAGER_EMP_ID);

   EMP_ID LNAME        DEPT_ID MANAGER_EMP_ID    SALARY HIRE_DATE 
--------- ---------- --------- -------------- --------- --------- 
     7369 SMITH             20           7902       800 17-DEC-80 
     7499 ALLEN             30           7698      1600 20-FEB-81 
     7521 WARD              30           7698      1250 22-FEB-81 
     7654 MARTIN            30           7698      1250 28-SEP-81 
     7844 TURNER            30           7698      1500 08-SEP-81 
     7876 ADAMS             20           7788      1100 23-MAY-87 
     7900 JAMES             30           7698       950 03-DEC-81 
     7934 MILLER            10           7782      1300 23-JAN-82 

8 rows selected.

In this example, the correlated subquery checks each employee to see whether he is the manager of any other employee. If NOT, then that particular employee is included in the result set.

    Team LiB   Previous Section   Next Section