5.15 Finding Continuous Periods
5.15.1 Problem
You want to find continuous periods from a
list of periods in the database. In our example, you want to list all
periods during which a contractor is booked and you wish to merge all
neighboring periods into one.
5.15.2 Solution
Use the following query to obtain a list of periods during which
contractors are booked:
SELECT
c1.ContractorId ContractorId,
CAST(c1.JobStart AS CHAR(12)) JobStart,
CAST(c2.JobEnd AS CHAR(12)) JobEnd,
DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length
FROM
ContractorsSchedules c1,
ContractorsSchedules c2,
ContractorsSchedules c3
WHERE
c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND
c1.ContractorId=c3.ContractorId AND
c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c4
WHERE c4.ContractorId=c1.ContractorId AND
c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c5
WHERE c5.ContractorId=c1.ContractorId AND
c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd)
GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd
HAVING
SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
=
DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1
ORDER BY ContractorId, c1. JobStart
The results should appear as follows:
ContractorId JobStart JobEnd Length
------------ ------------ ------------ -----------
Alex Feb 1 2001 Feb 5 2001 5
Alex Feb 11 2001 Feb 20 2001 10
Alex Jan 1 2001 Jan 30 2001 30
Bob Feb 5 2001 Feb 15 2001 11
Bob Jan 5 2001 Jan 15 2001 11
5.15.3 Discussion
This problem may sound simple at first; however, there are some
aspects to it that make it a good example of periodic manipulation.
The key problem is in the handling of neighboring periods. If two
periods are neighbors, the beginning of one period will be one
temporal unit apart from the end of the other period. In our example,
the temporal unit is one day, so a contractor might stop work on one
project on a given day only to begin work on another project the very
next day. For the purposes of this report, we want to report an
extended time period that covers both projects. How do we accomplish
this?
Let's think about the definition of a period for a
moment. A period is defined as a duration with definite beginning and
end points. All contractor jobs begin on a specific date, and they
all end on a specific date. Let's apply the same
concept to a sequence of adjacent periods. If a contractor has
several projects that come in succession, there will always be one
project that comes first and one project that comes last. We are
interested in the JobStart date from the first project and the JobEnd
date from the last project.
The key to our query now is to identify correctly the first and last
projects in a series of adjacent projects. We can begin by joining
the ContractorsSchedules table to itself, thereby generating all
possible combinations of JobStart and JobEnd dates. In this recipe,
we'll refer to each such combination as a
candidate. The following set of characteristics
can then be used to narrow the results down to only those candidates
that represent the beginning and end of a series of adjacent projects
for a given contractor:
The candidate JobEnd date must not immediately precede the JobStart
date of another project.
The candidate JobStart date must not immediately follow the JobEnd
date of another project.
The JobStart date from the first project in the series must be less
than the JobEnd date from the last project in the series.
There must be no unassigned days (i.e., not within a project) within
a candidate date range.
We begin writing our solution query by joining ContractorsSchedules
to itself, thus generating all possible combinations of JobStart
dates from one project with JobEnd dates from a second project. We
join ContractorsSchedules to itself a third time to include all
possible combinations of intervening projects. The following SELECT
represents the beginnings of our query. We've added
columns to display the dates from c3 to make it easier for you to
follow our logic as we walk through the remainder of the query.
We've also modified the headings slightly to make it
clear from which alias (c1, c2, or c3) each date is drawn.
SELECT
c1.ContractorId ContractorId,
CAST(c1.JobStart AS CHAR(12)) JobStart1,
CAST(c2.JobEnd AS CHAR(12)) JobEnd2,
DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length,
CAST(c3.JobStart as CHAR(12)) JobStart3,
CAST(c3.JobEnd as CHAR(12)) JobEnd3
FROM
ContractorsSchedules c1,
ContractorsSchedules c2,
ContractorsSchedules c3
WHERE
c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND
c1.ContractorId=c3.ContractorId AND
c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd
ContractorId JobStart1 JobEnd2 Length JobStart3 JobEnd3
------------ ------------ ------------ ----------- ------------ ------------
Alex Jan 1 2001 Jan 10 2001 10 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Jan 20 2001 20 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Jan 30 2001 30 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Jan 20 2001 20 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Jan 30 2001 30 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Jan 30 2001 30 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 5 2001 36 Feb 1 2001 Feb 5 2001
Alex Jan 1 2001 Feb 20 2001 51 Feb 1 2001 Feb 5 2001
Alex Jan 1 2001 Feb 20 2001 51 Feb 11 2001 Feb 20 2001
Alex Jan 11 2001 Jan 20 2001 10 Jan 11 2001 Jan 20 2001
Alex Jan 11 2001 Jan 30 2001 20 Jan 11 2001 Jan 20 2001
Alex Jan 11 2001 Feb 5 2001 26 Jan 11 2001 Jan 20 2001
Alex Jan 11 2001 Feb 20 2001 41 Jan 11 2001 Jan 20 2001
Alex Jan 11 2001 Jan 30 2001 20 Jan 21 2001 Jan 30 2001
Alex Jan 11 2001 Feb 5 2001 26 Jan 21 2001 Jan 30 2001
Alex Jan 11 2001 Feb 20 2001 41 Jan 21 2001 Jan 30 2001
Alex Jan 11 2001 Feb 5 2001 26 Feb 1 2001 Feb 5 2001
Alex Jan 11 2001 Feb 20 2001 41 Feb 1 2001 Feb 5 2001
Alex Jan 11 2001 Feb 20 2001 41 Feb 11 2001 Feb 20 2001
Alex Jan 21 2001 Jan 30 2001 10 Jan 21 2001 Jan 30 2001
Alex Jan 21 2001 Feb 5 2001 16 Jan 21 2001 Jan 30 2001
Alex Jan 21 2001 Feb 20 2001 31 Jan 21 2001 Jan 30 2001
Alex Jan 21 2001 Feb 5 2001 16 Feb 1 2001 Feb 5 2001
Alex Jan 21 2001 Feb 20 2001 31 Feb 1 2001 Feb 5 2001
Alex Jan 21 2001 Feb 20 2001 31 Feb 11 2001 Feb 20 2001
Alex Feb 1 2001 Feb 5 2001 5 Feb 1 2001 Feb 5 2001
Alex Feb 1 2001 Feb 20 2001 20 Feb 1 2001 Feb 5 2001
Alex Feb 1 2001 Feb 20 2001 20 Feb 11 2001 Feb 20 2001
Alex Feb 11 2001 Feb 20 2001 10 Feb 11 2001 Feb 20 2001
Bob Jan 5 2001 Jan 15 2001 11 Jan 5 2001 Jan 15 2001
Bob Jan 5 2001 Feb 15 2001 42 Jan 5 2001 Jan 15 2001
Bob Jan 5 2001 Feb 15 2001 42 Feb 5 2001 Feb 15 2001
Bob Feb 5 2001 Feb 15 2001 11 Feb 5 2001 Feb 15 2001
(39 row(s) affected)
In this query, the c1 alias represents the first project in a series
of projects. The c2 alias represents the last project in a series.
The c3 alias represents a possible project in between c1 and c2 (see
the seventh row of output). As it currently stands, this query simply
generates a list of candidate periods. We need to eliminate periods
that don't meet our criteria.
We can begin by eliminating any candidate periods that have
neighbors. The SELECT statement in the following NOT EXISTS clause
identifies candidates for which there is another project adjacent to
the candidate end date (c2.JobEnd):
NOT EXISTS(
SELECT * FROM ContractorsSchedules c4
WHERE c4.ContractorId=c1.ContractorId AND
c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd)
Let's talk about this NOT EXISTS in detail.
c2.JobEnd represents our candidate end date. We need to find out
whether any other projects overlap that end date. Notice that we add
one day to the end date. We do that because we wish to use the
BETWEEN operator. If c2.JobEnd is one day prior to c4.JobStart,
BETWEEN will return false. We add 1 to c2.JobEnd so that it overlaps
any c4.JobStart date representing the next day. For example, if
c2.JobEnd is January 10 and there exists another job for the period
January 11-January 20, the addition of one day converts:
Jan 10 BETWEEN Jan 11 and Jan 20
to:
Jan 11 BETWEEN Jan 11 and Jan 20
This condition will be true, and the candidate period ending on
January 11 will be eliminated because the period can obviously be
extended through January 20.
But what about c4.JobEnd? If c2.JobEnd is January 20 and c4.JobEnd is
also January 20, adding one day to the c2.JobEnd date will bring it
out of range. For example:
Jan 20 BETWEEN Jan 11 and Jan 20
becomes:
Jan 21 BETWEEN Jan 11 and Jan 20
It's not a problem, because if two projects end on
the same date, that end date is still a valid ending date for a
series of projects. Later in the query, we'll use
the GROUP BY clause to eliminate any possible duplicate rows that
result from two projects ending (or beginning) on the same date.
Just as we eliminate candidates' date ranges that
can be extended going forward, we also need to eliminate those
candidates' date ranges that can be extended going
backwards. The following logic is used to do this:
NOT EXISTS(
SELECT * FROM ContractorsSchedules c5
WHERE c5.ContractorId=c1.ContractorId AND
c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd)
As you can see, this logic is similar to the previous NOT EXISTS
clause. The difference is that we are working with c1.JobStart, and
we subtract one day rather than add, because we are looking backwards
this time.
The following version of the query shows the results of eliminating
candidate periods that can be extended forwards or backwards:
SELECT
c1.ContractorId ContractorId,
CAST(c1.JobStart AS CHAR(12)) JobStart1,
CAST(c2.JobEnd AS CHAR(12)) JobEnd2,
DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length,
CAST(c3.JobStart as CHAR(12)) JobStart3,
CAST(c3.JobEnd as CHAR(12)) JobEnd3
FROM
ContractorsSchedules c1,
ContractorsSchedules c2,
ContractorsSchedules c3
WHERE
c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND
c1.ContractorId=c3.ContractorId AND
c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c4
WHERE c4.ContractorId=c1.ContractorId AND
c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c5
WHERE c5.ContractorId=c1.ContractorId AND
c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd)
ContractorId JobStart1 JobEnd2 Length JobStart3 JobEnd3
------------ ------------ ------------ ----------- ------------ ------------
Alex Jan 1 2001 Jan 30 2001 30 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Jan 30 2001 30 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Jan 30 2001 30 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Feb 5 2001 36 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 5 2001 36 Feb 1 2001 Feb 5 2001
Alex Feb 1 2001 Feb 5 2001 5 Feb 1 2001 Feb 5 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 1 2001 Jan 10 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 11 2001 Jan 20 2001
Alex Jan 1 2001 Feb 20 2001 51 Jan 21 2001 Jan 30 2001
Alex Jan 1 2001 Feb 20 2001 51 Feb 1 2001 Feb 5 2001
Alex Jan 1 2001 Feb 20 2001 51 Feb 11 2001 Feb 20 2001
Alex Feb 1 2001 Feb 20 2001 20 Feb 1 2001 Feb 5 2001
Alex Feb 1 2001 Feb 20 2001 20 Feb 11 2001 Feb 20 2001
Alex Feb 11 2001 Feb 20 2001 10 Feb 11 2001 Feb 20 2001
Bob Jan 5 2001 Jan 15 2001 11 Jan 5 2001 Jan 15 2001
Bob Jan 5 2001 Feb 15 2001 42 Jan 5 2001 Jan 15 2001
Bob Jan 5 2001 Feb 15 2001 42 Feb 5 2001 Feb 15 2001
Bob Feb 5 2001 Feb 15 2001 11 Feb 5 2001 Feb 15 2001
Our WHERE clause is now complete, and we are left with a much shorter
list of possible candidate periods. We've eliminated
candidate periods that can be extended and, thus, are too short. Now,
we need to eliminate duplicate rows from the result set. We use the
following GROUP BY clause:
GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd
If you've been following along with our examples so
far, you'll realize that adding this GROUP BY clause
forces us to change our SELECT list. You'll see that
change in a moment, but first we need to eliminate one more set of
unwanted candidate periods.
We haven't checked for gaps yet within our candidate
periods. Each candidate period (c1.JobStart through c2.JobEnd)
represents a sequence of one or more projects. If more than one
project is involved, we need to ensure that there are no unassigned
days within the candidate period. This is where the HAVING clause
comes into play.
Remember the c3 alias? It represents all periods whose start dates
lay within any of our candidate periods. To ensure that there are no
gaps, we count the days between c1.JobStart and c2.JobEnd and compare
that result to the sum of the days for all the intervening projects:
HAVING
SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
=
DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1
The SUM function sums the lengths of all included periods (within
c3). The first expression adds one day for each included period
(i.e., each c3 row). This is done because DATEDIFF returns the
difference between two dates and not the number of days involved. For
example: Jan 2-Jan 1 = 1 day, yet that period represents 2 days. We
lose one day from each c3 period that we look at, so the expression
adds one day for each such period, well, almost.
The second DATEDIFF function call returns the total number of days in
the candidate period. Here, too, we have to add one day to the
DATEDIFF result, to calculate the number of days from the date
difference. If the number of days in the candidate period is the same
as the number of days in all periods falling within the candidate
period, we know that there are no unassigned days, or gaps, and that
our candidate period truly represents a contiguous period of activity
for the contractor in question. This gets us to our complete query:
SELECT
c1.ContractorId ContractorId,
CAST(c1.JobStart AS CHAR(12)) JobStart,
CAST(c2.JobEnd AS CHAR(12)) JobEnd,
DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length
FROM
ContractorsSchedules c1,
ContractorsSchedules c2,
ContractorsSchedules c3
WHERE
c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND
c1.ContractorId=c3.ContractorId AND
c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c4
WHERE c4.ContractorId=c1.ContractorId AND
c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
NOT EXISTS(
SELECT * FROM ContractorsSchedules c5
WHERE c5.ContractorId=c1.ContractorId AND
c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd)
GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd
HAVING
SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
=
DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1
ORDER BY ContractorId, c1.JobStart
The GROUP BY clause defines the groups over which the SUM function
operates. Each candidate period becomes one group. The HAVING clause
functions as we've just described. The ORDER BY
clause simply sorts the results by JobStart date for each contractor.
Note that our proposed solution to this recipe is applicable only in
situations where business rules prevent contractors from having
overlapping assignments. We've done that to simplify
the query. The idea of counting days in the HAVING clause flies out
the window when projects can overlap, because, in such a case, the
number of project days can exceed the number of calendar days. To
make our solution valid for cases where assignments are allowed to
overlap, rewrite the HAVING clause as follows:
HAVING NOT EXISTS(
SELECT c1.JobStart+i
FROM Pivot
WHERE i <= DATEDIFF(day,c1.JobStart,c2.JobEnd) AND
NOT EXISTS (SELECT * FROM ContractorsSchedules c6
WHERE c1.JobStart+i BETWEEN c6.JobStart AND c6.JobEnd
AND c6.ContractorId = c1.ContractorId))
In this HAVING clause, we have a subquery against the Pivot table
that returns one row for each date in the candidate period. We have
yet another subquery that filters out those rows that correspond to
scheduled projects. If all dates in the candidate period are
scheduled, the outer subquery will return no rows, and the candidate
period will be accepted. Otherwise, the candidate period will be
rejected. The use of the Pivot table and the nested subquery add to
the complexity of this solution.
|