Team LiB   Previous Section   Next Section

3.8 Working with Runs

3.8.1 Problem

You want to find runs in your table. In our example, you want to find any increasing (arithmetically and nonarithmetically) sequences of purity values.

3.8.2 Solutions

Use the following query:

SELECT 
   p1.ContainerId SeqBeg, p2.ContainerId SeqEnd
FROM ProductionLine p1, ProductionLine p2
WHERE 
   (p1.ContainerId < p2.ContainerId) AND
   NOT EXISTS(SELECT * FROM ProductionLine p3, ProductionLine p4 
      WHERE ( 
      p3.Purity<=p4.Purity AND 
      p4.ContainerId=p3.ContainerId-1 AND
      p3.ContainerId BETWEEN p1.ContainerId+1 AND p2.ContainerId) 
      OR (p3.ContainerId=p1.ContainerId-1 AND p3.Purity<p1.Purity) 
      OR (p3.ContainerId=p2.ContainerId+1 AND p3.Purity>p2.Purity))

SeqBeg      SeqEnd      
----------- ----------- 
2           5
6           9

3.8.3 Discussion

This query uses a framework similar to that which you've seen many times before in this chapter. Unlike a sequence, a run is a continuously increasing, though not necessarily monotonically increasing, series of values. Unlike the previous recipe in which we were looking for monotonically increasing sequences, we do not have a constant difference between ContainerId and Purity values. Consequently, we need a fourth table, p4 in this instance, to check for rows in the middle of a candidate interval that do not comply with the run requirement. This p4 table comes into play in the subquery, where we join it to p3.

For every element between p1 and p2, p3 and its predecessor are compared to see if their values are increasing:

p3.Purity<=p4.Purity AND 
p4.ContainerId=p3.ContainerId-1 AND
p3.ContainerId BETWEEN p1.ContainerId+1 AND p2.ContainerId

The BETWEEN clause limits the scope to rows between the borders (p1 and p2) of the candidate run in question. The p1 border is increased by 1, which covers all pairs within the scope. Note that there is always one less pair than the number of rows.

In a manner similar to other queries for regions and sequences, the last two conditions in the subquery's WHERE clause ensure that the borders of the candidate run cannot be extended:

(p3.ContainerId=p1.ContainerId-1 AND p3.Purity<p1.Purity) OR
(p3.ContainerId=p2.ContainerId+1 AND p3.Purity>p2.Purity)

If a row can be returned to satisfy these conditions, then the run can be extended and should be rejected in favor of the larger run.

The common framework that this solution shares with earlier recipes allows you to take techniques presented earlier for regions and sequences and apply them to runs.

    Team LiB   Previous Section   Next Section