5.4 Inline ViewsMost texts covering SQL define the FROM clause of a SELECT statement as containing a list of tables and/or views. Please abandon this definition and replace it with the following: the FROM clause contains a list of data sets. In this light, it is easy to see how the FROM clause can contain tables (permanent data sets), views (virtual data sets), and SELECT statements (temporary data sets). A SELECT statement in the FROM clause of a containing SELECT statement is referred to as an inline view:[4] it is one of the most powerful, underutilized features of Oracle SQL.
Here's a simple example: SELECT d.dept_id, d.name, emp_cnt.tot FROM department d, (SELECT dept_id, COUNT(*) tot FROM employee GROUP BY dept_id) emp_cnt WHERE d.dept_id = emp_cnt.dept_id; DEPT_ID NAME TOT ---------- -------------------- ---------- 1 Human Resources 1 2 Accounting 1 3 Sales 24 In this example, the FROM clause references the department table and an inline view called emp_cnt, which calculates the number of employees in each department. The two sets are joined using dept_id and the ID, name, and employee count are returned for each department. While this example is fairly simple, inline views allow us to do things in a single query that might otherwise require multiple select statements or a procedural language to accomplish. 5.4.1 Inline View BasicsBecause the result set from an inline view is referenced by other elements of the containing query, we must give our inline view a name and provide aliases for all ambiguous columns. In the previous example, the inline view was given the name "emp_cnt", and the alias "tot" was assigned to the COUNT(*) column. Similar to other types of subqueries, inline views may join multiple tables, call built-in and user-defined functions, specify optimizer hints, and include GROUP BY, HAVING, and CONNECT BY clauses. Unlike other types of subqueries, an inline view may also contain an ORDER BY clause, which opens several interesting possibilities (see Section 5.5 later in the chapter for an example using ORDER BY in a subquery). Inline views are particularly useful when we need to combine data at different levels of aggregation. In the previous example, we needed to retrieve all rows from the department table and include aggregated data from the employee table, so I chose to do the aggregation within an inline view and join the results to the department table. Anyone involved in report generation or data warehouse extraction, transformation, and load (ETL) applications has doubtless encountered situations where data from various levels of aggregation needs to be combined; with inline views, we should be able to produce the desired results in a single SQL statement rather than having to break the logic into multiple pieces or write code in a procedural language. When considering using an inline view, ask the following questions:
In general, using an inline view should enhance the readability and performance of the query, and it should generate a manageable data set that is of no value to other statements or sessions; otherwise, we may want to consider building a permanent or temporary table so that we can share the data between sessions and build additional indexes as needed. 5.4.2 Query ExecutionInline views are always executed prior to the containing query and, thus, may not reference columns from other tables or inline views from the same query. After execution, the containing query interacts with the inline view as if it were an unindexed, in-memory table. If inline views are nested, the innermost inline view is executed first, followed by the next-innermost inline view, and so on. Consider the following query: SELECT d.dept_id dept_id, d.name dept_name, dept_orders.tot_orders tot_orders FROM department d, (SELECT e.dept_id dept_id, SUM(emp_orders.tot_orders) tot_orders FROM employee e, (SELECT sales_emp_id, COUNT(*) tot_orders FROM cust_order WHERE order_dt >= TRUNC(SYSDATE) -- 365 AND cancelled_dt IS NULL GROUP BY sales_emp_id ) emp_orders WHERE e.emp_id = emp_orders.sales_emp_id GROUP BY e.dept_id ) dept_orders WHERE d.dept_id = dept_orders.dept_id; DEPT_ID DEPT_NAME TOT_ORDERS ---------- -------------------- ---------- 3 Sales 2760 If you're new to inline views, this query might be intimidating. Start with the innermost query, understand the result set generated by that query, and move outward to the next level. Since inline views must be noncorrelated, you can run each inline view's SELECT statement individually and look at the results.[5]
For the previous query, executing the emp_orders inline view generates the following result set: SELECT sales_emp_id, COUNT(*) tot_orders FROM cust_order WHERE order_dt >= TRUNC(SYSDATE) -- 365 AND cancelled_dt IS NULL GROUP BY sales_emp_id SALES_EMP_ID TOT_ORDERS ------------ ---------- 11 115 12 115 13 115 14 115 15 115 16 115 17 115 18 115 19 115 20 114 21 115 22 115 23 115 24 115 25 115 26 115 27 115 28 115 29 115 30 116 31 115 32 115 33 115 34 115 The emp_orders set contains all salespeople who booked orders in the last year, along with the total number of orders booked. The next level up is the dept_orders inline view, which joins the emp_orders data set to the employee table and aggregates the number of orders up to the department level. The resulting data set looks as follows: SELECT e.dept_id dept_id, SUM(emp_orders.tot_orders) tot_orders FROM employee e, (SELECT sales_emp_id, COUNT(*) tot_orders FROM cust_order WHERE order_dt >= TRUNC(SYSDATE) -- 365 AND cancelled_dt IS NULL GROUP BY sales_emp_id ) emp_orders WHERE e.emp_id = emp_orders.sales_emp_id GROUP BY e.dept_id DEPT_ID TOT_ORDERS ---------- ---------- 3 2185 4 575 Finally, the dept_orders set is joined to the department table, and the final result set is: DEPT_ID DEPT_NAME TOT_ORDERS ---------- -------------------- ---------- 3 Domestic Sales 2185 4 International Sales 575 After query execution completes, the emp_orders and dept_orders result sets are discarded. 5.4.3 Data Set FabricationAlong with querying existing tables, inline views may be used to fabricate special-purpose data sets that don't exist in the database. For example, we might want to aggregate orders over the last year by small, medium, and large orders, but the concept of order sizes may not have been defined in our database. We could build a table with three records to define the different sizes and their boundaries, but we only need this information for a single query, and we don't want to clutter our database with dozens of small, special-purpose tables. One solution is to use set operators like UNION[6] to construct a custom-built data set, as in:
SELECT 'SMALL' name, 0 lower_bound, 999 upper_bound from dual UNION ALL SELECT 'MEDIUM' name, 1000 lower_bound, 24999 upper_bound from dual UNION ALL SELECT 'LARGE' name, 25000 lower_bound, 9999999 upper_bound from dual; NAME LOWER_BOUND UPPER_BOUND ------ ----------- ----------- SMALL 0 999 MEDIUM 1000 24999 LARGE 25000 9999999 We can then wrap this query in an inline view and use it to do our aggregations: SELECT sizes.name order_size, SUM(co.sale_price) tot_dollars FROM cust_order co, (SELECT 'SMALL' name, 0 lower_bound, 999 upper_bound from dual UNION ALL SELECT 'MEDIUM' name, 1000 lower_bound, 24999 upper_bound from dual UNION ALL SELECT 'LARGE' name, 25000 lower_bound, 9999999 upper_bound from dual ) sizes WHERE co.cancelled_dt IS NULL AND co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY') AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY') AND co.sale_price BETWEEN sizes.lower_bound AND sizes.upper_bound GROUP BY sizes.name; ORDER_ TOT_DOLLARS ------ ----------- LARGE 7136214 MEDIUM 32395018 SMALL 63676 One word of caution: when constructing a set of ranges, make sure there are no gaps through which data may slip. For example, an order totaling $999.50 would not appear in either the small or medium categories, since $999.50 is neither between $0 and $999 nor between $1,000 and $24,999. One solution is to overlap the region boundaries so that there is no gap through which data can slip. Note that we can no longer use BETWEEN with this approach. SELECT sizes.name order_size, SUM(co.sale_price) tot_dollars FROM cust_order co, (SELECT 'SMALL' name, 0 lower_bound, 1000 upper_bound from dual UNION ALL SELECT 'MEDIUM' name, 1000 lower_bound, 25000 upper_bound from dual UNION ALL SELECT 'LARGE' name, 25000 lower_bound, 9999999 upper_bound from dual ) sizes WHERE co.cancelled_dt IS NULL AND co.order_dt >= TO_DATE('01-JAN-2001', 'DD-MON-YYYY') AND co.order_dt < TO_DATE('01-JAN-2002', 'DD-MON-YYYY') AND co.sale_price >= sizes.lower_bound AND co.sale_price < sizes.upper_bound GROUP BY sizes.name; Now that we have neither an overlap or gap between our buckets, we can be sure that no data will be left out of the aggregations. Fabricated data sets can also be useful for determining what data is not stored in our database. For example, our manager might ask for a report listing the aggregate sales for each day of the year 2000, including days with no sales. While our cust_order table contains records for every day that had orders, there is no table in the database containing a record for every day of the year. In order to provide our manager with an answer, we will need to fabricate a driving table containing a record for every day in 2000, and then outer join it to the set of aggregated sales for the same period. Since a year contains either 365 or 366 days, we will build the set {0, 1, 2, ..., 399}, add each member of the set to the start date of 01/01/2000, and let Oracle throw away the rows that don't belong in 2000. To build the set {0, 1, 2, ..., 399}, we will create the sets {0, 1, 2, ..., 10}, {0, 10, 20, 30, ..., 90}, and {0, 100, 200, 300} and add members of the three sets across the Cartesian product: SELECT ones.x + tens.x + hundreds.x tot FROM (SELECT 0 x FROM dual UNION ALL SELECT 1 x FROM dual UNION ALL SELECT 2 x FROM dual UNION ALL SELECT 3 x FROM dual UNION ALL SELECT 4 x FROM dual UNION ALL SELECT 5 x FROM dual UNION ALL SELECT 6 x FROM dual UNION ALL SELECT 7 x FROM dual UNION ALL SELECT 8 x FROM dual UNION ALL SELECT 9 x FROM dual) ones, (SELECT 0 x FROM dual UNION ALL SELECT 10 x FROM dual UNION ALL SELECT 20 x FROM dual UNION ALL SELECT 30 x FROM dual UNION ALL SELECT 40 x FROM dual UNION ALL SELECT 50 x FROM dual UNION ALL SELECT 60 x FROM dual UNION ALL SELECT 70 x FROM dual UNION ALL SELECT 80 x FROM dual UNION ALL SELECT 90 x FROM dual) tens, (SELECT 0 x FROM dual UNION ALL SELECT 100 x FROM dual UNION ALL SELECT 200 x FROM dual UNION ALL SELECT 300 x FROM dual) hundreds Since this query has no WHERE clause, every combination of the rows in the ones, tens, and hundreds sets will be generated, and the sum of the three numbers in each row will produce the set {0, 1, 2, ..., 399}. The next query generates the set of days in 2000 by adding each number in the set to the base date and then discarding days that fall in 2001: SELECT days.dt FROM (SELECT TO_DATE('01-JAN-2000', 'DD-MON-YYYY') + ones.x + tens.x + hundreds.x dt FROM (SELECT 0 x FROM dual UNION ALL SELECT 1 x FROM dual UNION ALL SELECT 2 x FROM dual UNION ALL SELECT 3 x FROM dual UNION ALL SELECT 4 x FROM dual UNION ALL SELECT 5 x FROM dual UNION ALL SELECT 6 x FROM dual UNION ALL SELECT 7 x FROM dual UNION ALL SELECT 8 x FROM dual UNION ALL SELECT 9 x FROM dual) ones, (SELECT 0 x FROM dual UNION ALL SELECT 10 x FROM dual UNION ALL SELECT 20 x FROM dual UNION ALL SELECT 30 x FROM dual UNION ALL SELECT 40 x FROM dual UNION ALL SELECT 50 x FROM dual UNION ALL SELECT 60 x FROM dual UNION ALL SELECT 70 x FROM dual UNION ALL SELECT 80 x FROM dual UNION ALL SELECT 90 x FROM dual) tens, (SELECT 0 x FROM dual UNION ALL SELECT 100 x FROM dual UNION ALL SELECT 200 x FROM dual UNION ALL SELECT 300 x FROM dual) hundreds) days WHERE days.dt < TO_DATE('01-JAN-2001', 'DD-MON-YYYY'); Since 2000 happens to be a leap year, the result set will contain 366 rows, one for each day of 2000. This query can then be wrapped in another inline view and used as the driving table for generating the report. Whether you would actually want to use such a strategy in your code is up to you; the main purpose of this example is to help get the creative juices flowing. 5.4.4 Overcoming SQL RestrictionsThe use of certain features of Oracle SQL can impose restrictions on our SQL statements. When these features are isolated from the rest of the query inside an inline view, however, these restrictions can be sidestepped. In this section, we explore how inline views can overcome limitations with hierarchical and aggregation queries. 5.4.4.1 Hierarchical queriesHierarchical queries allow recursive relationships to be traversed. As an example of a recursive relationship, consider a table called "region" that holds data about sales territories. Regions are arranged in a hierarchy, and each record in the region table references the region in which it is contained, as illustrated by the following data: SELECT * FROM region; REGION_ID REGION_NAME SUPER_REGION_ID --------- --------------- --------------- 1 North America 2 Canada 1 3 United States 1 4 Mexico 1 5 New England 3 6 Mid-Atlantic 3 7 SouthEast US 3 8 SouthWest US 3 9 NorthWest US 3 10 Central US 3 11 Europe 12 France 11 13 Germany 11 14 Spain 11 Each record in the customer table references the smallest of its applicable regions. Given a particular region, it is possible to construct a query that traverses up or down the hierarchy by utilizing the START WITH and CONNECT BY clauses: SELECT region_id, name, super_region_id FROM region START WITH name = `North America' CONNECT BY PRIOR region_id = super_region_id; REGION_ID NAME SUPER_REGION_ID ---------- -------------------- --------------- 1 North America 2 Canada 1 3 United States 1 5 New England 3 6 Mid-Atlantic 3 7 SouthEast US 3 8 SouthWest US 3 9 NorthWest US 3 10 Central US 3 4 Mexico 1 The query just shown traverses the region hierarchy starting with the North America region and working down the tree. Looking carefully at the results, we see that the Canada, United States, and Mexico regions all point to the North America region via the super_region_id field. The remainder of the rows all point to the United States region. Thus, we have identified a three-level hierarchy with one node at the top, three nodes in the second level, and six nodes in the third level underneath the United States node. For a detailed look at hierarchical queries, see Chapter 8. Imagine that we have been asked to generate a report showing total sales in 2001 for each sub-region of North America. However, hierarchical queries have the restriction that the table being traversed cannot be joined to other tables within the same query, so it might seem impossible to generate the report from a single query. Using an inline view, however, we can isolate the hierarchical query on the region table from the customer and cust_order tables, as in: SELECT na_regions.name region_name, SUM(co.sale_price) total_sales FROM cust_order co, customer c, (SELECT region_id, name FROM region START WITH name = 'North America' CONNECT BY PRIOR region_id = super_region_id) na_regions WHERE co.cancelled_dt IS NULL AND co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY') AND co.order_dt < TO_DATE('01-JAN-2002','DD-MON-YYYY') AND co.cust_nbr = c.cust_nbr AND c.region_id = na_regions.region_id GROUP BY na_regions.name; REGION_NAME TOTAL_SALES -------------------- ----------- Central US 6238901 Mid-Atlantic 6307766 New England 6585641 NorthWest US 6739374 SouthEast US 6868495 SouthWest US 6854731 Even though the na_regions set includes the North America and United States regions, customer records always point to the smallest applicable region, which is why these particular regions do not show up in the final result set. By placing the hierarchical query within an inline view, we are able to temporarily flatten the region hierarchy to suit the purposes of the query, which allows us to bypass the restriction on hierarchical queries without resorting to splitting the logic into multiple pieces. The next section will demonstrate a similar strategy for working with aggregation queries. 5.4.4.2 Aggregate queriesQueries that perform aggregations have the following restriction: all nonaggregate columns in the SELECT clause must be included in the GROUP BY clause. Consider the following query, which aggregates sales data by customer and salesperson, and then adds supporting data from the customer, region, employee, and department tables: SELECT c.name customer, r.name region, e.fname || ' ' || e.lname salesperson, d.name department, SUM(co.sale_price) total_sales FROM cust_order co, customer c, region r, employee e, department d WHERE co.cust_nbr = c.cust_nbr AND c.region_id = r.region_id AND co.sales_emp_id = e.emp_id AND e.dept_id = d.dept_id AND co.cancelled_dt IS NULL AND co.order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY') GROUP BY c.name, r.name, e.fname || ' ' || e.lname, d.name; Since every nonaggregate in the SELECT clause must be included in the GROUP BY clause, we are forced to sort on five columns, since a sort is needed to generate the groupings. Because every customer is in one and only one region and every employee is in one and only one department, we really only need to sort on the customer and employee fields in order to produce the desired results. Thus, the Oracle engine is wasting its time sorting on the region and department names. By isolating the aggregation from the supporting tables, however, we can create a more efficient and more understandable query: SELECT c.name customer, r.name region,
e.fname || ' ' || e.lname salesperson, d.name department,
cust_emp_orders.total total_sales
FROM customer c, region r, employee e, department d,
(SELECT cust_nbr, sales_emp_id, SUM(sale_price) total
FROM cust_order
WHERE cancelled_dt IS NULL
AND order_dt >= TO_DATE('01-JAN-2001','DD-MON-YYYY')
GROUP BY cust_nbr, sales_emp_id) cust_emp_orders
WHERE cust_emp_orders.cust_nbr = c.cust_nbr
AND c.region_id = r.region_id
AND cust_emp_orders.sales_emp_id = e.emp_id
AND e.dept_id = d.dept_id;
Since the cust_order table includes the customer number and salesperson ID, we can perform the aggregation against these two columns without the need to include the other four tables. Not only are we sorting on fewer columns, we are sorting on numeric fields (customer number and employee ID) rather than potentially lengthy strings (customer name, region name, employee name, and department name). The containing query uses the cust_nbr and sales_emp_id columns from the inline view to join to the customer and employee tables, which in turn are used to join to the region and department tables. By performing the aggregation within an inline view, we have sidestepped the restriction that all nonaggregates be included in the GROUP BY clause. We have also shortened execution time by eliminating unnecessary sorts, and we have minimized the number of joins to the customer, region, employee, and department tables. Depending on the amount of data in the tables, these improvements could yield significant performance gains. 5.4.5 Inline Views in DML StatementsNow that we are comfortable with inline views, it's time to add another wrinkle: inline views may also be used in INSERT, UPDATE, and DELETE statements. In most cases, using an inline view in a DML statement improves readability but otherwise adds little value to statement execution. To illustrate, we'll begin with a fairly simple UPDATE statement and then show the equivalent statement using an inline view: UPDATE cust_order co SET co.expected_ship_dt = co.expected_ship_dt + 7 WHERE co.cancelled_dt IS NULL AND co.ship_dt IS NULL AND EXISTS (SELECT 1 FROM line_item li, part p WHERE li.order_nbr = co.order_nbr AND li.part_nbr = p.part_nbr AND p.inventory_qty = 0); This statement uses an EXISTS condition to locate orders that include out-of-stock parts. The next version uses an inline view called suspended_orders to identify the same set of orders: UPDATE (SELECT co.expected_ship_dt exp_ship_dt
FROM cust_order co
WHERE co.cancelled_dt IS NULL AND co.ship_dt IS NULL
AND EXISTS (SELECT 1 FROM line_item li, part p
WHERE li.order_nbr = co.order_nbr
AND li.part_nbr = p.part_nbr
AND p.inventory_qty = 0)) suspended_orders
SET suspended_orders.exp_ship_dt = suspended_orders.exp_ship_dt + 7;
In the first statement, the WHERE clause of the UPDATE statement determines the set of rows to be updated, whereas in the second statement, the result set returned by the SELECT statement determines the target rows. Otherwise, they are identical. In order for the inline view to add extra value to the statement, it must be able to do something that the simple update statement can not do: join multiple tables. The following version attempts to do just that by replacing the subquery with a three-table join: UPDATE (SELECT co.expected_ship_dt exp_ship_dt
FROM cust_order co, line_item li, part p
WHERE co.cancelled_dt IS NULL AND co.ship_dt IS NULL
AND co.order_nbr = li.order_nbr AND li.part_nbr = p.part_nbr
AND p.inventory_qty = 0) suspended_orders
SET suspended_orders.exp_ship_dt = suspended_orders.exp_ship_dt + 7;
However, statement execution results in the following error: ORA-01779: cannot modify a column which maps to a non key-preserved table As is often the case in life, we can't get something for nothing. In order to take advantage of the ability to join multiple tables within a DML statement, we must abide by the following rules:
While the previous update statement attempts to modify only one table (cust_order), the key (order_nbr) is not preserved in the result set, since an order has multiple line items. In other words, rows in the result set generated by the three-table join cannot be uniquely identified using just the order_nbr field, so it is not possible to update the cust_order table via this particular three table join. However, it is possible to update or delete from the line_item table using the same join, since the key of the line_item table matches the key of the result set returned from the inline view (order_nbr and part_nbr). The next statement deletes rows from the line_item table using an inline view nearly identical to the one that failed for the previous UPDATE attempt: DELETE FROM (SELECT li.order_nbr order_nbr, li.part_nbr part_nbr FROM cust_order co, line_item li, part p WHERE co.cancelled_dt IS NULL AND co.ship_dt IS NULL AND co.order_nbr = li.order_nbr AND li.part_nbr = p.part_nbr AND p.inventory_qty = 0) suspended_orders; The column(s) referenced in the SELECT clause of the inline view are actually irrelevant. Since the line_item table is the only key-preserved table of the three tables listed in the FROM clause, this is the table on which the DELETE statement operates. While utilizing an inline view in a DELETE statement can be more efficient, it's somewhat disturbing that it is not immediately obvious which table is the focus of the DELETE statement. A reasonable convention when writing such statements would be to always select the key columns from the target table. 5.4.6 Restricting Access Using WITH CHECK OPTIONAnother way in which inline views can add value to DML statements is by restricting both the rows and columns that may be modified. For example, most companies only allow members of Human Resources to see or modify salary information. By restricting the columns visible to the DML statement, we can effectively hide the salary column: UPDATE (SELECT emp_id, fname, lname, dept_id, manager_emp_id FROM employee) emp SET emp.manager_emp_id = 11 WHERE emp.dept_id = 4; While the previous statement executes cleanly, attempting to add the salary column to the SET clause would yield the following error: UPDATE (SELECT emp_id, fname, lname, dept_id, manager_emp_id
FROM employee) emp
SET emp.manager_emp_id = 11, emp.salary = 1000000000
WHERE emp.dept_id = 4;
ORA-00904: invalid column name
Of course, the person writing the UPDATE statement has full access to the table; the intent here is to protect against unauthorized modifications by the users. This might prove useful in an n-tier environment, where the interface layer interacts with a business-logic layer. While this mechanism is useful for restricting access to particular columns, it does not limit access to particular rows in the target table. In order to restrict the rows that may be modified using a DML statement, we can add a WHERE clause to the inline view and specify WITH CHECK OPTION. For example, we may want to restrict the users from modifying data for any employee in the HR department: UPDATE (SELECT emp_id, fname, lname, dept_id, manager_emp_id FROM employee WHERE dept_id != (SELECT dept_id FROM department WHERE name = 'Human Resources') WITH CHECK OPTION) emp SET emp.manager_emp_id = 11 WHERE emp.dept_id = 4; The addition of WITH CHECK OPTION to the inline view constrains the DML statement to comply with the WHERE clause of the inline view. An attempt to update or delete data for an employee in the HR department will not succeed but will not raise an exception (updates/deletes 0 rows). However, an attempt to add a new employee to the HR department will yield the following error: ORA-01402: view WITH CHECK OPTION where-clause violation Thus, the following statement will fail with ORA-01402 because it attempts to add an employee to the Human Resources department: INSERT INTO (SELECT emp_id, fname, lname, dept_id, manager_emp_id FROM employee WHERE dept_id != (SELECT dept_id FROM department WHERE name = 'Human Resources') WITH CHECK OPTION) emp SELECT 99, 'Charles', 'Brown', d.dept_id, NULL FROM department d WHERE d.name = 'Human Resources'; |