14.1 Know When to Use Specific ConstructsDepending on the circumstances, certain SQL constructs are preferable to others. For example, use of the EXISTS predicate is often preferable to IN. The same is not true for NOT EXISTS versus NOT IN. The next sections discuss the usage of such constructs. 14.1.1 EXISTS Is Preferable to DISTINCTThe DISTINCT keyword used in a SELECT clause eliminates duplicate rows in the result set. To eliminate those duplicates, Oracle performs a sort, and that sort requires time and disk space. Therefore, avoid using DISTINCT if you can tolerate having duplicate rows returned by a query. If you can't tolerate the duplicate rows, or your application can't handle them, use EXISTS in place of DISTINCT. For example, assume you are trying to find the names of customers who have orders. Your query has to be based on two tables: CUSTOMER and CUST_ORDER. Using DISTINCT, your query would be written as follows: SELECT DISTINCT C.CUST_NBR, C.NAME FROM CUSTOMER C, CUST_ORDER O WHERE C.CUST_NBR = O.CUST_NBR; The corresponding execution plan for this query is as follows. Note the SORT operation, which is a result of DISTINCT being used. Query Plan ----------------------------------------- SELECT STATEMENT Cost = 3056 SORT UNIQUE MERGE JOIN INDEX FULL SCAN IND_ORD_CUST_NBR SORT JOIN TABLE ACCESS FULL CUSTOMER To use EXISTS, the query needs to be rewritten as follows: SELECT C.CUST_NBR, C.NAME FROM CUSTOMER C WHERE EXISTS (SELECT 1 FROM CUST_ORDER O WHERE C.CUST_NBR = O.CUST_NBR); Here is the execution plan for the EXISTS version of the query. Look at the cost of this query versus the earlier DISTINCT query, and notice the performance improvement. Query Plan --------------------------------------- SELECT STATEMENT Cost = 320 FILTER TABLE ACCESS FULL CUSTOMER INDEX RANGE SCAN IND_ORD_CUST_NBR The version of the query using EXISTS is less than one-ninth as costly as the version using DISTINCT. This is because the sort has been avoided. 14.1.2 EXISTS Versus INMany SQL books discuss the fact that NOT EXISTS performs better than NOT IN. We have found that with Oracle8i, the EXPLAIN PLAN generated by NOT EXISTS is exactly the same as that generated by NOT IN, and the performance for the two predicates is the same. However, the comparison between EXISTS and IN is a different story. We've found that EXISTS often performs better than IN. Let's look at an example that demonstrates this. The following query uses IN to delete the orders for customers in region 5: DELETE FROM CUST_ORDER WHERE CUST_NBR IN (SELECT CUST_NBR FROM CUSTOMER WHERE REGION_ID = 5); The execution plan for this query is as follows: Query Plan ------------------------------------ DELETE STATEMENT Cost = 3 DELETE CUST_ORDER HASH JOIN TABLE ACCESS FULL CUST_ORDER TABLE ACCESS FULL CUSTOMER Now, let's look at that same query, written using EXISTS: DELETE FROM CUST_ORDER WHERE EXISTS (SELECT CUST_NBR FROM CUSTOMER WHERE CUST_ORDER.CUST_NBR = CUSTOMER.CUST_NBR AND REGION_ID = 5); The execution plan for the EXISTS version of the query is: Query Plan -------------------------------------------- DELETE STATEMENT Cost = 1 DELETE CUST_ORDER FILTER TABLE ACCESS FULL CUST_ORDER TABLE ACCESS BY INDEX ROWID CUSTOMER INDEX UNIQUE SCAN CUSTOMER_PK Notice the cost difference between the two queries. The IN version of the query has a cost of 3, while the EXISTS version of the query has a cost of only 1. When the EXISTS clause is used, the execution plan is driven by the outer table, whereas when the IN clause is used, the execution plan is driven by the table in the subquery. The EXISTS query will almost always be faster than the IN query, except for cases when the table in the subquery has very few rows as compared to the outer table. 14.1.3 WHERE Versus HAVINGWe discussed the GROUP BY and HAVING clauses in Chapter 4. Sometimes, when writing a GROUP BY query, you have a condition that you can specify in either the WHERE clause or the HAVING clause. In situations where you have a choice, you'll always get better performance if you specify the condition in the WHERE clause. The reason is that it's less expensive to eliminate rows before they are summarized than it is to eliminate results after summarization. Let's look at an example illustrating the advantage of WHERE over HAVING. Here's a query with the HAVING clause that reports the number of orders in the year 2000: SELECT YEAR, COUNT(*) FROM ORDERS GROUP BY YEAR HAVING YEAR = 2000; YEAR COUNT(*) ---------- ---------- 2000 720 The execution plan for this query is as follows: Query Plan ------------------------------------------- SELECT STATEMENT Cost = 6 FILTER SORT GROUP BY INDEX FAST FULL SCAN ORDERS_PK Now, look at that same query, but with the year restriction in the WHERE clause: SELECT YEAR, COUNT(*) FROM ORDERS WHERE YEAR = 2000 GROUP BY YEAR; YEAR COUNT(*) ---------- ---------- 2000 720 The execution plan for this version of the query is: Query Plan ------------------------------------- SELECT STATEMENT Cost = 2 SORT GROUP BY NOSORT INDEX FAST FULL SCAN ORDERS_PK With the HAVING clause, the query performs the group operation first, and then filters the groups for the condition specified. The WHERE clause version of the query filters the rows before performing the group operation. The result of filtering with the WHERE clause is that there are fewer rows to summarize, and consequently the query performs better. However, you should note that not all types of filtering can be achieved using the WHERE clause. Sometimes, you may need to summarize the data first, and then filter the summarized data based upon the summarized values. In such situations, you have to filter using the HAVING clause, because only the HAVING clause can "see" summarized values. Moreover, there are situations when you may need to use the WHERE clause and the HAVING clause together in a query to filter the results the way you want. For details, see Chapter 4. 14.1.4 UNION Versus UNION ALLWe discussed UNION and UNION ALL in Chapter 6. UNION ALL combines the results of two SELECT statements. UNION combines the results of two SELECT statements, and then returns only distinct rows from the combination; duplicates are eliminated. It is, therefore, obvious that to remove the duplicates, UNION performs one extra step than UNION ALL. This extra step is a sort, which is costly in terms of performance. Therefore, whenever your application can handle duplicates or you are certain that no duplicates will result, consider using UNION ALL instead of UNION. Let's look an example to understand this issue better. The following query uses UNION to return a list of orders where the sale price exceeds $50.00 or where the customer is located in region 5: SELECT ORDER_NBR, CUST_NBR FROM CUST_ORDER WHERE SALE_PRICE > 50 UNION SELECT ORDER_NBR, CUST_NBR FROM CUST_ORDER WHERE CUST_NBR IN (SELECT CUST_NBR FROM CUSTOMER WHERE REGION_ID = 5); ORDER_NBR CUST_NBR ---------- ---------- 1000 1 1001 1 1002 5 1003 4 1004 4 1005 8 1006 1 1007 5 1008 5 1009 1 1011 1 1012 1 1015 5 1017 4 1019 4 1021 8 1023 1 1025 5 1027 5 1029 1 20 rows selected. The execution plan for this UNION query is: Query Plan ----------------------------------------------------------------------------------- SELECT STATEMENT Cost = 8 SORT UNIQUE UNION-ALL TABLE ACCESS FULL CUST_ORDER HASH JOIN TABLE ACCESS FULL CUSTOMER TABLE ACCESS FULL CUST_ORDER The following query uses UNION ALL instead of UNION to get the same information: SELECT ORDER_NBR, CUST_NBR FROM CUST_ORDER WHERE SALE_PRICE > 50 UNION ALL SELECT ORDER_NBR, CUST_NBR FROM CUST_ORDER WHERE CUST_NBR IN (SELECT CUST_NBR FROM CUSTOMER WHERE REGION_ID = 5); ORDER_NBR CUST_NBR ---------- ---------- 1001 1 1003 4 1005 8 1009 1 1012 1 1017 4 1021 8 1029 1 1001 1 1000 1 1002 5 1003 4 1004 4 1006 1 1007 5 1008 5 1009 1 1012 1 1011 1 1015 5 1017 4 1019 4 1023 1 1025 5 1027 5 1029 1 26 rows selected. Note the duplicate rows in the output. However, note also that UNION ALL performs better than UNION, as you can see from the following execution plan: Query Plan ----------------------------------------------------------------------------------- SELECT STATEMENT Cost = 4 UNION-ALL TABLE ACCESS FULL CUST_ORDER HASH JOIN TABLE ACCESS FULL CUSTOMER TABLE ACCESS FULL CUST_ORDER Compare this execution plan with its cost of 4 with the previous plan and its cost of 8. You can see that the extra operation (SORT UNIQUE) in the UNION makes it run slower than UNION ALL. |