5.2 Noncorrelated SubqueriesNoncorrelated subqueries allow each row from the containing SQL statement to be compared to a set of values. Divide noncorrelated subqueries into the following three categories, depending on the number of rows and columns returned in their result set:
Depending on the category, different sets of operators may be employed by the containing SQL statement to interact with the subquery. 5.2.1 Single-Row, Single-Column SubqueriesA subquery that returns a single row with a single column is treated like a scalar by the containing statement; not surprisingly, these types of subqueries are known as scalar subqueries. The subquery may appear on either side of a condition, and the usual comparison operators (=, <, >, !=, <=, >=) are employed. The following query illustrates the utility of single-row, single-column subqueries by finding all employees earning an above-average salary. The subquery returns the average salary, and the containing query then returns all employees who earn more than that amount. SELECT lname FROM employee WHERE salary > (SELECT AVG(salary) FROM EMPLOYEE); LNAME -------------------- Brown Smith Blake Isaacs Jacobs King Fox Anderson Nichols Iverson Peters Russell As this query demonstrates, it can be perfectly reasonable for a subquery to reference the same tables as the containing query. In fact, subqueries are frequently used to isolate a subset of records within a table. For example, many applications include maintenance routines that clean up operational data, such as exception or load logs. Every week, a script might delete all but the latest day's activity. For example: DELETE FROM load_log WHERE load_dt < (SELECT MAX(TRUNC(load_dt)) FROM load_log); Noncorrelated subqueries are also commonly found outside the WHERE clause, as illustrated by the following query, which identifies the salesperson responsible for the most orders: SELECT sales_emp_id, COUNT(*) FROM cust_order GROUP BY sales_emp_id HAVING COUNT(*) = (SELECT MAX(COUNT(*)) FROM cust_order GROUP BY sales_emp_id); SALES_EMP_ID COUNT(*) ------------ ---------- 30 121 This subquery calculates the number of orders attributable to each salesperson, and then applies the MAX function to return only the highest number of orders. The containing query performs the same aggregation as the subquery and then keeps only those salespeople whose total sales count matches the maximum value returned by the subquery. Interestingly, the containing query can return more than one row if multiple salespeople tie for the maximum sales count, while the subquery is guaranteed to return a single row and column. If it seems wasteful that the subquery and containing query both perform the same aggregation, it is; see Chapter 13 for more efficient ways to handle these types of queries. So far, we have seen scalar subqueries in the WHERE and HAVING clauses of SELECT statements, along with the WHERE clause of a DELETE statement. Before we delve deeper into the different types of subqueries, let's explore where else subqueries can and can't be utilized in SQL statements:
5.2.2 Multiple-Row SubqueriesNow that we know how to use single-row, single-column subqueries, let's explore how to use subqueries that return multiple rows. When a subquery returns more than one row, it is not possible to use only comparison operators, since a single value cannot be directly compared to a set of values. However, a single value can be compared to each value in a set. To accomplish this, the special keywords ANY and ALL may be used with comparison operators to determine if a value is equal to (or less than, greater than, etc.) any members of the set or all members of the set. Consider the following query: SELECT fname, lname FROM employee WHERE dept_id = 3 AND salary >= ALL (SELECT salary FROM employee WHERE dept_id = 3); FNAME LNAME -------------------- -------------------- Mark Russell The subquery returns the set of salaries for department 3, and the containing query checks each employee in the department to see if her salary is greater or equal to every salary returned by the subquery. Thus, this query retrieves the name of the highest paid person in department 3. While everyone except the lowest paid employee has a salary >= some of the salaries in the departement, only the highest paid employee has a salary >= all of the salaries in the department. If multiple employees tie for the highest salary in the department, multiple names will be returned. Another way to phrase the previous query is to find the employee whose salary is not less than any other salary in the department. We can do this using the ANY operator: SELECT fname, lname FROM employee WHERE dept_id = 3 AND NOT salary < ANY (SELECT salary FROM employee WHERE dept_id = 3); There are almost always multiple ways to phrase the same query. One of the challenges of writing SQL is striking the right balance between efficiency and readability. In this case, I might prefer using AND salary >= ALL over AND NOT salary < ANY because the first variation is easier to understand; however, the latter form might prove more efficient, since each evaluation of the subquery results requires from 1 to N comparisons when using ANY versus exactly N comparisons when using ALL.[1]
The next query uses the ANY operator to find all employees whose salary exceeds that of any top-level manager: SELECT fname, lname FROM employee WHERE manager_emp_id IS NOT NULL AND salary > ANY (SELECT salary FROM employee WHERE manager_emp_id IS NULL); FNAME LNAME -------------------- -------------------- Laura Peters Mark Russell The subquery returns the set of salaries for all top-level managers, and the containing query returns the names of non-top-level managers whose salary exceeds any of the salaries returned by the subquery. Any time this query returns one or more rows, rest assured that top-level management will vote themselves a pay increase. For the previous three queries, failure to include either the ANY or ALL operators will result in the following error: ORA-01427: single-row subquery returns more than one row The wording of this error message is a bit confusing. After all, how can a single-row subquery return multiple rows? What the error message is trying to convey is that a multiple-row subquery has been identified where only a single-row subquery is allowed. If we are not absolutely certain that our subquery will return exactly one row, we must include ANY or ALL to ensure our code doesn't fail in the future. Along with ANY and ALL, we may also use the IN operator for working with multi-row subqueries. Using IN with a subquery is functionally equivalent to using = ANY, and returns TRUE if a match is found in the set returned by the subquery. The following query uses IN to postpone shipment of all orders containing parts which are not currently in stock: UPDATE cust_order SET expected_ship_dt = TRUNC(SYSDATE) + 1 WHERE ship_dt IS NULL AND order_nbr IN (SELECT l.order_nbr FROM line_item l, part p WHERE l.part_nbr = p.part_nbr AND p.inventory_qty = 0); The subquery returns the set of orders requesting out-of-stock parts, and the containing UPDATE statement modifies the expected ship date of all orders in the set. We think you will agree that IN is more intuitive than = ANY, which is why IN is almost always used in such situations. Similarly, we can use NOT IN instead of using != ANY as demonstrated by the next query, which deletes all customers who haven't placed an order in the past five years: DELETE FROM customer WHERE cust_nbr NOT IN (SELECT cust_nbr FROM cust_order WHERE order_dt >= TRUNC(SYSDATE) -- (365 * 5)); The subquery returns the set of customers that have placed an order in the past five years, and the containing DELETE statement removes all customers that are not in the set returned by the subquery. Finding members of one set that do not exist in another set is referred to as an anti-join. As the name implies, an anti-join is the opposite of a join; rows from table A are returned if the specified data is not found in table B. The Oracle optimizer can employ multiple strategies for executing such queries, including a merge anti-join or a hash anti-join.[2]
5.2.3 Multiple-Column SubqueriesWhile all of the previous examples compare a single column from the containing SQL statement to the result set returned by the subquery, it is also possible to issue a subquery against multiple columns. Consider the following UPDATE statement, which rolls up data from an operational table into an aggregate table: UPDATE monthly_orders SET tot_orders = (SELECT COUNT(*) FROM cust_order WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), max_order_amt = (SELECT MAX(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), min_order_amt = (SELECT MIN(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY') AND cancelled_dt IS NULL), tot_amt = (SELECT SUM(sale_price) FROM cust_order WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY') AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY') AND cancelled_dt IS NULL) WHERE month = 11 and year = 2001; The UPDATE statement modifies four columns in the monthly_orders table, and values for each of the four columns are calculated by aggregating data in the cust_order table. Looking closely, we see that the WHERE clauses for all four subqueries are identical; only the aggregation type differs in the four queries. The next query demonstrates how all four columns can be populated with a single trip through the cust_order table: UPDATE monthly_orders
SET (tot_orders, max_order_amt, min_order_amt, tot_amt) =
(SELECT COUNT(*), MAX(sale_price), MIN(sale_price), SUM(sale_price)
FROM cust_order
WHERE order_dt >= TO_DATE('01-NOV-2001','DD-MON-YYYY')
AND order_dt < TO_DATE('01-DEC-2001','DD-MON-YYYY')
AND cancelled_dt IS NULL)
WHERE month = 11 and year = 2001;
The second statement achieves the same result more efficiently than the first by performing four aggregations during one trip through the cust_order table, rather than one aggregation during each of four separate trips. Whereas the previous example demonstrates the use of a multiple-column subquery in the SET clause of an UPDATE statement, such subqueries may also be utilized in the WHERE clause of a SELECT, UPDATE, or DELETE statement. The next statement deletes all items from open orders that include discontinued parts: DELETE FROM line_item WHERE (order_nbr, part_nbr) IN (SELECT c.order_nbr, p.part_nbr FROM cust_order c, line_item li, part p WHERE c.ship_dt IS NULL AND c.cancelled_dt IS NULL AND c.order_nbr = li.order_nbr AND li.part_nbr = p.part_nbr AND p.status = 'DISCONTINUED'); Note the use of the IN operator in the WHERE clause. Two columns are listed together in parentheses prior to the IN keyword. Values in these two columns are compared to the set of two values returned by each row of the subquery. If a match is found, the row is removed from the line_item table. |