3.3 Self Joins
There are situations
in which one row of a table is related
to another row of the same table. The EMPLOYEE table is a good
example. The manager of one employee is also an employee. The rows
for both are in the same EMPLOYEE table. This relationship is
indicated in the MANAGER_EMP_ID column:
CREATE TABLE EMPLOYEE (
EMP_ID NUMBER (4) NOT NULL PRIMARY KEY,
FNAME VARCHAR2 (15),
LNAME VARCHAR2 (15),
DEPT_ID NUMBER (2),
MANAGER_EMP_ID NUMBER (4) REFERENCES EMPLOYEE(EMP_ID),
SALARY NUMBER (7,2),
HIRE_DATE DATE,
JOB_ID NUMBER (3));
To get information about an employee and his manager, you have to
join the EMPLOYEE table with itself. This is achieved by specifying
the EMPLOYEE table twice in the FROM clause and using two different
table
aliases,
thereby treating EMPLOYEE as if it were two separate tables. The
following example lists the name of each employee and his manager:
SELECT E.NAME EMPLOYEE, M.NAME MANAGER
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID;
EMPLOYEE MANAGER
---------- ----------
SMITH FORD
ALLEN BLAKE
WARD BLAKE
JONES KING
MARTIN BLAKE
BLAKE KING
CLARK KING
SCOTT JONES
TURNER BLAKE
ADAMS SCOTT
JAMES BLAKE
FORD JONES
MILLER CLARK
13 rows selected.
Notice the use of the EMPLOYEE table twice in the FROM clause with
two different aliases. Also notice the join condition that reads as:
"Where the employee's
MANAGER_EMP_ID is the same as his manager's
EMP_ID."
3.3.1 Self Outer Joins
Even though
the EMPLOYEE
table has 14 rows, the previous query returned only 13 rows. This is
because there is an employee without a MANAGER_EMP_ID. Oracle
excludes this row from the result set while performing the self inner
join. To include the employee(s) without a MANAGER_EMP_ID, you need
an outer join:
SELECT E.LNAME EMPLOYEE, M.LNAME MANAGER
FROM EMPLOYEE E, EMPLOYEE M
WHERE E.MANAGER_EMP_ID = M.EMP_ID (+);
EMPLOYEE MANAGER
---------- ----------
SMITH FORD
ALLEN BLAKE
WARD BLAKE
JONES KING
MARTIN BLAKE
BLAKE KING
CLARK KING
SCOTT JONES
KING
TURNER BLAKE
ADAMS SCOTT
JAMES BLAKE
FORD JONES
MILLER CLARK
14 rows selected.
Be careful when placing the (+) operator
in a join
condition. If you put the (+) on the wrong side, you will get an
absurd result set that makes no sense. In this case, the EMPLOYEE
table we need to make optional is the one from which we are drawing
manager names.
3.3.2 Self Non-Equi-Joins
The previous
example showed self-equi-joins.
However, there are situations when you need to perform
self-non-equi-joins. We will illustrate this by an example.
Let's assume that you are in charge of organizing
interdepartmental basket ball competition within your company. It is
your responsibility to draw the teams and schedule the competition.
You query the DEPARTMENT table and get the following result:
SELECT NAME FROM DEPARTMENT;
NAME
--------------
ACCOUNTING
RESEARCH
SALES
OPERATIONS
You find that there are four departments, and to make a fair
competition, you decide that each department plays against the other
three departments once, and at the end the department with the
maximum wins is declared the winner. You have been to an Oracle SQL
training class recently, and decide to apply the concept of self
join. You execute the following query:
SELECT D1.NAME TEAM1, D2.NAME TEAM2
FROM DEPARTMENT D1, DEPARTMENT D2;
TEAM1 TEAM2
-------------- --------------
ACCOUNTING ACCOUNTING
RESEARCH ACCOUNTING
SALES ACCOUNTING
OPERATIONS ACCOUNTING
ACCOUNTING RESEARCH
RESEARCH RESEARCH
SALES RESEARCH
OPERATIONS RESEARCH
ACCOUNTING SALES
RESEARCH SALES
SALES SALES
OPERATIONS SALES
ACCOUNTING OPERATIONS
RESEARCH OPERATIONS
SALES OPERATIONS
OPERATIONS OPERATIONS
16 rows selected.
Disappointing results. From your knowledge of high school
mathematics, you know that four teams each playing once with the
other three makes six combinations. However, your SQL query returned
16 rows. Now you realize that since you didn't
specify any join condition, you got a Cartesian product from your
query. You put in a join condition, and your query and results now
look as follows:
SELECT D1.NAME TEAM1, D2.NAME TEAM2
FROM DEPARTMENT D1, DEPARTMENT D2
WHERE D1.DEPT_ID = D2.DEPT_ID;
TEAM1 TEAM2
-------------- --------------
ACCOUNTING ACCOUNTING
RESEARCH RESEARCH
SALES SALES
OPERATIONS OPERATIONS
Oops! The equi-join returned a very unwanted result. A team
can't play against itself. You realize your mistake,
and this sparks the idea that you can use non-equi-joins in this
situation. You rewrite the query as a non-equi-join. You
don't want a team to play against itself, and
therefore replace the "=" operator
in the join condition with "!=".
Let's look at the results:
SELECT D1.NAME TEAM1, D2.NAME TEAM2
FROM DEPARTMENT D1, DEPARTMENT D2
WHERE D1.DEPT_ID != D2.DEPT_ID;
TEAM1 TEAM2
-------------- --------------
RESEARCH ACCOUNTING
SALES ACCOUNTING
OPERATIONS ACCOUNTING
ACCOUNTING RESEARCH
SALES RESEARCH
OPERATIONS RESEARCH
ACCOUNTING SALES
RESEARCH SALES
OPERATIONS SALES
ACCOUNTING OPERATIONS
RESEARCH OPERATIONS
SALES OPERATIONS
12 rows selected.
Still not done. In this result set, you have permutations such as
(RESEARCH, ACCOUNTING) and (ACCOUNTING, RESEARCH), and so on.
Therefore, each team plays against the others twice. You need to
remove these permutations, which you rightly consider to be
duplicates. You think about using DISTINCT. DISTINCT will not help here,
because the row (RESEARCH, ACCOUNTING) is different from the row
(ACCOUNTING, RESEARCH) from the viewpoint of DISTINCT; but not from
the viewpoint of your requirement. After some thought, you want to
try out an inequality operator other than
"!=". You decide to go with the
less-than (<) operator. Here are the results you get:
SELECT D1.NAME TEAM1, D2.NAME TEAM2
FROM DEPARTMENT D1, DEPARTMENT D2
WHERE D1.DEPT_ID < D2.DEPT_ID;
TEAM1 TEAM2
-------------- --------------
ACCOUNTING RESEARCH
ACCOUNTING SALES
RESEARCH SALES
ACCOUNTING OPERATIONS
RESEARCH OPERATIONS
SALES OPERATIONS
6 rows selected.
That's it! Now you have six combinations: each team
plays against the other three just once. Let's
examine why this version of the query works. Conceptually, when
Oracle executes this query, a Cartesian product is first formed with
16 rows. Then the less-than (<) operator
in the join condition restricts
the result set to those rows in which the DEPT_ID of Team 1 is less
than the DEPT_ID of Team 2. The less-than (<) operator eliminates
the duplicates, because for any given permutation of two departments
this condition is satisfied for only one. Using greater-than (>)
instead of less-than (<) will
also give you the required result, but the TEAM1 and TEAM2 values
will be reversed:
SELECT D1.NAME TEAM1, D2.NAME TEAM2
FROM DEPARTMENT D1, DEPARTMENT D2
WHERE D1.DEPT_ID > D2.DEPT_ID;
TEAM1 TEAM2
-------------- --------------
RESEARCH ACCOUNTING
SALES ACCOUNTING
OPERATIONS ACCOUNTING
SALES RESEARCH
OPERATIONS RESEARCH
OPERATIONS SALES
6 rows selected.
Don't be disheartened by the painful process you had
to go through to get this result. Sometimes you have to go through an
agonizing experience to get simple results such as these.
That's life. Now that you have the team combinations
right, go a bit further and assign a date for each match. Use
"tomorrow" as the starting date:
SELECT D1.NAME TEAM1, D2.NAME TEAM2, SYSDATE + ROWNUM MATCH_DATE
FROM DEPARTMENT D1, DEPARTMENT D2
WHERE D1.DEPT_ID < D2.DEPT_ID;
TEAM1 TEAM2 MATCH_DATE
-------------- -------------- ---------
ACCOUNTING RESEARCH 30-APR-01
ACCOUNTING SALES 01-MAY-01
RESEARCH SALES 02-MAY-01
ACCOUNTING OPERATIONS 03-MAY-01
RESEARCH OPERATIONS 04-MAY-01
SALES OPERATIONS 05-MAY-01
6 rows selected.
Now publish these results on the corporate intranet along with the
rules and regulations for the competition, and you are done.
|