3.3 Finding Regions
3.3.1 Problem
Find a region in a list. In our example, you
must find all containers in the
production line that have a purity index of 100. These represent
normal production output. Furthermore, you want only cases where at
least two containers in succession have a purity of 100. An odd
container with a purity of 100 in the midst of a number of containers
with bad purity levels is not to be reported as normal production
output.
3.3.2 Solution
Look at the problem as a region finding problem and use the following
code:
SELECT DISTINCT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
p1.Purity=100 AND p2.Purity=100 AND
abs(p1.ContainerId-p2.ContainerId)=1
ContainerID
-----------
1
2
10
11
12
3.3.3 Discussion
Obviously, if it weren't for the requirement to have
at least two containers in a row with normal purity before reporting
a container as normal, the result could be obtained by finding all
containers with a purity level of 100:
SELECT * FROM ProductionLine
WHERE Purity=100
To return the correct result, we have to use a technique to find
regions in the list. To find neighboring rows with the same value, we
need two copies of the same table. We name them p1 and p2:
SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
Then, we filter out all rows that do not match the criterion of
having one neighbor of the same value. The trick here to finding
neighbors is calculating the distance between the p1.ContainerId and
p2.ContainerId. If the distance is 1, the two elements are neighbors.
If they have the same value, they should be included in the result:
SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
abs(p1.ContainerId-p2.ContainerId)=1
We then add another condition to the WHERE clause to restrict the
results further to only those cases where the two neighboring
containers have a purity of 100:
SELECT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
p1.Purity=100 AND p2.Purity=100 AND
abs(p1.ContainerId-p2.ContainerId)=1
Finally, in the SELECT clause, we use DISTINCT to eliminate repeated
references to the same container:
SELECT DISTINCT p1.ContainerID
FROM ProductionLine p1, ProductionLine p2
WHERE
p1.Purity=100 AND p2.Purity=100 AND
abs(p1.ContainerId-p2.ContainerId)=1
You can try to run the query without the DISTINCT clause, and, as
you'll see using our sample data, it will return
container ID 11 twice. This is because the 11th row has two neighbors
with a purity of 100 (10 and 12) and, thus, is reported twice.
|