3.13 Comparing Two Rows in an Array
3.13.1 Problem
You want to check to see whether two
rows in an array are equal. In our
example, you want to check if two production lines in the
ProductionFacility table are equal.
3.13.2 Solution
To check rows in the table for equality, use the following code. The
result will be a list of rows in the array that are equivalent:
SELECT p1.Line p1_Line, 'is equal to', p2.Line p2_Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
p1.Line<p2.Line
GROUP BY p1.Line, p2.Line
HAVING
COUNT(*)=(SELECT COUNT(*) FROM ProductionFacility p3 WHERE p3.Line=p1.Line)
AND
COUNT(*)=(SELECT COUNT(*) FROM ProductionFacility p4 WHERE p4.Line=p2.Line)
p1_Line p2_Line
----------- ----------- -----------
0 is equal to 3
3.13.3 Discussion
This query ends up being quite expensive, using four table instances;
as a result, it is a good demonstration of how SQL is not very
efficient in working with arrays. However, expensive as it is, it
does allow you to get results using only one query.
The FROM clause creates a cross-join between the two instances of the
ProductionFacility. We name the two instances p1 and p2 for easier
reference. We define in the SELECT statement that the result will
report one line for each pair of rows that are equal. Since the
cross-join produces many rows, we use the GROUP BY statement to limit
the result to just one row of output per row in the array.
The WHERE clause specifies three conditions:
Purity levels must be equal.
Container IDs must be equal.
Production-line numbers from p1 must be less than those from p2.
If you work with multidimensional arrays, simply add additional
comparison clauses to the WHERE clause to compare parameters for
equality. To compare for full equality between two rows, you must
have one comparison expression for each dimension in your array. In
our example, the two comparison clauses involve the ContainerId and
Line columns. The comparison expression involving the Purity columns
is what we use to determine whether two array elements are equal. So
a match on ContainerId and Line defines two elements that need to be
compared, and the test of equality involves the Purity column.
The intermediate results at this point, without the GROUP BY clause,
are as follows:
SELECT p1.ContainerId, p1.Purity, p1.Line, p2.Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
p1.Line<p2.Line
ContainerId Purity Line Line
----------- ----------- ----------- -----------
3 100 0 1
1 100 0 3
2 100 0 3
3 100 0 3
3 100 1 3
Add in the GROUP BY clause and we get:
SELECT COUNT(*) ContainerCount, p1.Line, p2.Line
FROM ProductionFacility p1, ProductionFacility p2
WHERE p1.Purity=p2.Purity AND p1.ContainerId=p2.ContainerId AND
p1.Line<p2.Line
GROUP BY p1.Line, p2.Line
ContainerCount Line Line
-------------- ----------- -----------
1 0 1
3 0 3
1 1 3
The HAVING clause is the expensive one. It compares the number of
matched pairs from the WHERE clause to the number of columns in both
rows. The first subquery checks for the number of rows in p1, and the
second, for the number of rows in p2. The HAVING clause makes sure
that only lines of equal size are reported in the final result. In
our example, each production line has produced three containers.
Looking at the intermediate results shown here, you can see that the
only two production lines with a container count of three are lines 0
and 3. The HAVING clause ensures that those are reported as the final
output from the query.
|