Team LiB   Previous Section   Next Section

5.15 Finding Continuous Periods

5.15.1 Problem

You want to find continuous periods from a list of periods in the database. In our example, you want to list all periods during which a contractor is booked and you wish to merge all neighboring periods into one.

5.15.2 Solution

Use the following query to obtain a list of periods during which contractors are booked:

SELECT 
   c1.ContractorId ContractorId, 
   CAST(c1.JobStart AS CHAR(12)) JobStart,
   CAST(c2.JobEnd AS CHAR(12)) JobEnd,
   DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length
FROM 
   ContractorsSchedules c1, 
   ContractorsSchedules c2, 
   ContractorsSchedules c3
WHERE 
   c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND 
   c1.ContractorId=c3.ContractorId AND 
      c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c4 
      WHERE c4.ContractorId=c1.ContractorId AND 
         c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c5 
      WHERE c5.ContractorId=c1.ContractorId AND 
         c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd) 
GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd
HAVING    
   SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
   = 
   DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1
ORDER BY ContractorId, c1. JobStart

The results should appear as follows:

ContractorId JobStart     JobEnd       Length      
------------ ------------ ------------ ----------- 
Alex         Feb  1 2001  Feb  5 2001  5
Alex         Feb 11 2001  Feb 20 2001  10
Alex         Jan  1 2001  Jan 30 2001  30
Bob          Feb  5 2001  Feb 15 2001  11
Bob          Jan  5 2001  Jan 15 2001  11

5.15.3 Discussion

This problem may sound simple at first; however, there are some aspects to it that make it a good example of periodic manipulation. The key problem is in the handling of neighboring periods. If two periods are neighbors, the beginning of one period will be one temporal unit apart from the end of the other period. In our example, the temporal unit is one day, so a contractor might stop work on one project on a given day only to begin work on another project the very next day. For the purposes of this report, we want to report an extended time period that covers both projects. How do we accomplish this?

Let's think about the definition of a period for a moment. A period is defined as a duration with definite beginning and end points. All contractor jobs begin on a specific date, and they all end on a specific date. Let's apply the same concept to a sequence of adjacent periods. If a contractor has several projects that come in succession, there will always be one project that comes first and one project that comes last. We are interested in the JobStart date from the first project and the JobEnd date from the last project.

The key to our query now is to identify correctly the first and last projects in a series of adjacent projects. We can begin by joining the ContractorsSchedules table to itself, thereby generating all possible combinations of JobStart and JobEnd dates. In this recipe, we'll refer to each such combination as a candidate. The following set of characteristics can then be used to narrow the results down to only those candidates that represent the beginning and end of a series of adjacent projects for a given contractor:

  • The candidate JobEnd date must not immediately precede the JobStart date of another project.

  • The candidate JobStart date must not immediately follow the JobEnd date of another project.

  • The JobStart date from the first project in the series must be less than the JobEnd date from the last project in the series.

  • There must be no unassigned days (i.e., not within a project) within a candidate date range.

We begin writing our solution query by joining ContractorsSchedules to itself, thus generating all possible combinations of JobStart dates from one project with JobEnd dates from a second project. We join ContractorsSchedules to itself a third time to include all possible combinations of intervening projects. The following SELECT represents the beginnings of our query. We've added columns to display the dates from c3 to make it easier for you to follow our logic as we walk through the remainder of the query. We've also modified the headings slightly to make it clear from which alias (c1, c2, or c3) each date is drawn.

SELECT 
   c1.ContractorId ContractorId, 
   CAST(c1.JobStart AS CHAR(12)) JobStart1,
   CAST(c2.JobEnd AS CHAR(12)) JobEnd2,
   DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length,
   CAST(c3.JobStart as CHAR(12)) JobStart3,
   CAST(c3.JobEnd as CHAR(12)) JobEnd3
FROM 
   ContractorsSchedules c1, 
   ContractorsSchedules c2, 
   ContractorsSchedules c3
WHERE 
   c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND 
   c1.ContractorId=c3.ContractorId AND 
      c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd

ContractorId JobStart1    JobEnd2      Length      JobStart3    JobEnd3      
------------ ------------ ------------ ----------- ------------ ------------ 
Alex         Jan  1 2001  Jan 10 2001  10          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Jan 20 2001  20          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Jan 30 2001  30          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Jan 20 2001  20          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Jan 30 2001  30          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Jan 30 2001  30          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Feb  1 2001  Feb  5 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Feb  1 2001  Feb  5 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Feb 11 2001  Feb 20 2001 
Alex         Jan 11 2001  Jan 20 2001  10          Jan 11 2001  Jan 20 2001 
Alex         Jan 11 2001  Jan 30 2001  20          Jan 11 2001  Jan 20 2001 
Alex         Jan 11 2001  Feb  5 2001  26          Jan 11 2001  Jan 20 2001 
Alex         Jan 11 2001  Feb 20 2001  41          Jan 11 2001  Jan 20 2001 
Alex         Jan 11 2001  Jan 30 2001  20          Jan 21 2001  Jan 30 2001 
Alex         Jan 11 2001  Feb  5 2001  26          Jan 21 2001  Jan 30 2001 
Alex         Jan 11 2001  Feb 20 2001  41          Jan 21 2001  Jan 30 2001 
Alex         Jan 11 2001  Feb  5 2001  26          Feb  1 2001  Feb  5 2001 
Alex         Jan 11 2001  Feb 20 2001  41          Feb  1 2001  Feb  5 2001 
Alex         Jan 11 2001  Feb 20 2001  41          Feb 11 2001  Feb 20 2001 
Alex         Jan 21 2001  Jan 30 2001  10          Jan 21 2001  Jan 30 2001 
Alex         Jan 21 2001  Feb  5 2001  16          Jan 21 2001  Jan 30 2001 
Alex         Jan 21 2001  Feb 20 2001  31          Jan 21 2001  Jan 30 2001 
Alex         Jan 21 2001  Feb  5 2001  16          Feb  1 2001  Feb  5 2001 
Alex         Jan 21 2001  Feb 20 2001  31          Feb  1 2001  Feb  5 2001 
Alex         Jan 21 2001  Feb 20 2001  31          Feb 11 2001  Feb 20 2001 
Alex         Feb  1 2001  Feb  5 2001  5           Feb  1 2001  Feb  5 2001 
Alex         Feb  1 2001  Feb 20 2001  20          Feb  1 2001  Feb  5 2001 
Alex         Feb  1 2001  Feb 20 2001  20          Feb 11 2001  Feb 20 2001 
Alex         Feb 11 2001  Feb 20 2001  10          Feb 11 2001  Feb 20 2001 
Bob          Jan  5 2001  Jan 15 2001  11          Jan  5 2001  Jan 15 2001 
Bob          Jan  5 2001  Feb 15 2001  42          Jan  5 2001  Jan 15 2001 
Bob          Jan  5 2001  Feb 15 2001  42          Feb  5 2001  Feb 15 2001 
Bob          Feb  5 2001  Feb 15 2001  11          Feb  5 2001  Feb 15 2001 

(39 row(s) affected)

In this query, the c1 alias represents the first project in a series of projects. The c2 alias represents the last project in a series. The c3 alias represents a possible project in between c1 and c2 (see the seventh row of output). As it currently stands, this query simply generates a list of candidate periods. We need to eliminate periods that don't meet our criteria.

We can begin by eliminating any candidate periods that have neighbors. The SELECT statement in the following NOT EXISTS clause identifies candidates for which there is another project adjacent to the candidate end date (c2.JobEnd):

NOT EXISTS(
   SELECT * FROM ContractorsSchedules c4 
   WHERE c4.ContractorId=c1.ContractorId AND 
     c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd)

Let's talk about this NOT EXISTS in detail. c2.JobEnd represents our candidate end date. We need to find out whether any other projects overlap that end date. Notice that we add one day to the end date. We do that because we wish to use the BETWEEN operator. If c2.JobEnd is one day prior to c4.JobStart, BETWEEN will return false. We add 1 to c2.JobEnd so that it overlaps any c4.JobStart date representing the next day. For example, if c2.JobEnd is January 10 and there exists another job for the period January 11-January 20, the addition of one day converts:

Jan 10 BETWEEN Jan 11 and Jan 20

to:

Jan 11 BETWEEN Jan 11 and Jan 20

This condition will be true, and the candidate period ending on January 11 will be eliminated because the period can obviously be extended through January 20.

But what about c4.JobEnd? If c2.JobEnd is January 20 and c4.JobEnd is also January 20, adding one day to the c2.JobEnd date will bring it out of range. For example:

Jan 20 BETWEEN Jan 11 and Jan 20

becomes:

Jan 21 BETWEEN Jan 11 and Jan 20

It's not a problem, because if two projects end on the same date, that end date is still a valid ending date for a series of projects. Later in the query, we'll use the GROUP BY clause to eliminate any possible duplicate rows that result from two projects ending (or beginning) on the same date.

Just as we eliminate candidates' date ranges that can be extended going forward, we also need to eliminate those candidates' date ranges that can be extended going backwards. The following logic is used to do this:

NOT EXISTS(
  SELECT * FROM ContractorsSchedules c5 
  WHERE c5.ContractorId=c1.ContractorId AND 
     c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd) 

As you can see, this logic is similar to the previous NOT EXISTS clause. The difference is that we are working with c1.JobStart, and we subtract one day rather than add, because we are looking backwards this time.

The following version of the query shows the results of eliminating candidate periods that can be extended forwards or backwards:

SELECT 
   c1.ContractorId ContractorId, 
   CAST(c1.JobStart AS CHAR(12)) JobStart1,
   CAST(c2.JobEnd AS CHAR(12)) JobEnd2,
   DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length,
   CAST(c3.JobStart as CHAR(12)) JobStart3,
   CAST(c3.JobEnd as CHAR(12)) JobEnd3
FROM 
   ContractorsSchedules c1, 
   ContractorsSchedules c2, 
   ContractorsSchedules c3
WHERE 
   c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND 
   c1.ContractorId=c3.ContractorId AND 
      c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c4 
      WHERE c4.ContractorId=c1.ContractorId AND 
         c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c5 
      WHERE c5.ContractorId=c1.ContractorId AND 
         c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd) 

ContractorId JobStart1    JobEnd2      Length      JobStart3    JobEnd3      
------------ ------------ ------------ ----------- ------------ ------------ 
Alex         Jan  1 2001  Jan 30 2001  30          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Jan 30 2001  30          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Jan 30 2001  30          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb  5 2001  36          Feb  1 2001  Feb  5 2001 
Alex         Feb  1 2001  Feb  5 2001  5           Feb  1 2001  Feb  5 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan  1 2001  Jan 10 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan 11 2001  Jan 20 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Jan 21 2001  Jan 30 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Feb  1 2001  Feb  5 2001 
Alex         Jan  1 2001  Feb 20 2001  51          Feb 11 2001  Feb 20 2001 
Alex         Feb  1 2001  Feb 20 2001  20          Feb  1 2001  Feb  5 2001 
Alex         Feb  1 2001  Feb 20 2001  20          Feb 11 2001  Feb 20 2001 
Alex         Feb 11 2001  Feb 20 2001  10          Feb 11 2001  Feb 20 2001 
Bob          Jan  5 2001  Jan 15 2001  11          Jan  5 2001  Jan 15 2001 
Bob          Jan  5 2001  Feb 15 2001  42          Jan  5 2001  Jan 15 2001 
Bob          Jan  5 2001  Feb 15 2001  42          Feb  5 2001  Feb 15 2001 
Bob          Feb  5 2001  Feb 15 2001  11          Feb  5 2001  Feb 15 2001

Our WHERE clause is now complete, and we are left with a much shorter list of possible candidate periods. We've eliminated candidate periods that can be extended and, thus, are too short. Now, we need to eliminate duplicate rows from the result set. We use the following GROUP BY clause:

GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd

If you've been following along with our examples so far, you'll realize that adding this GROUP BY clause forces us to change our SELECT list. You'll see that change in a moment, but first we need to eliminate one more set of unwanted candidate periods.

We haven't checked for gaps yet within our candidate periods. Each candidate period (c1.JobStart through c2.JobEnd) represents a sequence of one or more projects. If more than one project is involved, we need to ensure that there are no unassigned days within the candidate period. This is where the HAVING clause comes into play.

Remember the c3 alias? It represents all periods whose start dates lay within any of our candidate periods. To ensure that there are no gaps, we count the days between c1.JobStart and c2.JobEnd and compare that result to the sum of the days for all the intervening projects:

HAVING
   SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
   = 
   DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1

The SUM function sums the lengths of all included periods (within c3). The first expression adds one day for each included period (i.e., each c3 row). This is done because DATEDIFF returns the difference between two dates and not the number of days involved. For example: Jan 2-Jan 1 = 1 day, yet that period represents 2 days. We lose one day from each c3 period that we look at, so the expression adds one day for each such period, well, almost.

The second DATEDIFF function call returns the total number of days in the candidate period. Here, too, we have to add one day to the DATEDIFF result, to calculate the number of days from the date difference. If the number of days in the candidate period is the same as the number of days in all periods falling within the candidate period, we know that there are no unassigned days, or gaps, and that our candidate period truly represents a contiguous period of activity for the contractor in question. This gets us to our complete query:

SELECT 
   c1.ContractorId ContractorId, 
   CAST(c1.JobStart AS CHAR(12)) JobStart,
   CAST(c2.JobEnd AS CHAR(12)) JobEnd,
   DATEDIFF(d,c1.JobStart, c2.JobEnd)+1 Length
FROM 
   ContractorsSchedules c1, 
   ContractorsSchedules c2, 
   ContractorsSchedules c3
WHERE 
   c1.ContractorId=c2.ContractorId AND c1.JobStart <= c2.JobStart AND 
   c1.ContractorId=c3.ContractorId AND 
      c3.JobStart BETWEEN c1.JobStart AND c2.JobEnd AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c4 
      WHERE c4.ContractorId=c1.ContractorId AND 
         c2.JobEnd+1 BETWEEN c4.JobStart AND c4.JobEnd) AND
   NOT EXISTS(
      SELECT * FROM ContractorsSchedules c5 
      WHERE c5.ContractorId=c1.ContractorId AND 
         c1.JobStart-1 BETWEEN c5.JobStart AND c5.JobEnd) 
GROUP BY c1.ContractorId, c1.JobStart, c2.JobEnd
HAVING    
   SUM(DATEDIFF(d,c3.JobStart,c3.JobEnd)) + COUNT(*)
   = 
   DATEDIFF(d,c1.JobStart,c2.JobEnd) + 1
ORDER BY ContractorId, c1.JobStart

The GROUP BY clause defines the groups over which the SUM function operates. Each candidate period becomes one group. The HAVING clause functions as we've just described. The ORDER BY clause simply sorts the results by JobStart date for each contractor.

Note that our proposed solution to this recipe is applicable only in situations where business rules prevent contractors from having overlapping assignments. We've done that to simplify the query. The idea of counting days in the HAVING clause flies out the window when projects can overlap, because, in such a case, the number of project days can exceed the number of calendar days. To make our solution valid for cases where assignments are allowed to overlap, rewrite the HAVING clause as follows:

HAVING NOT EXISTS(
   SELECT c1.JobStart+i
   FROM Pivot
   WHERE i <= DATEDIFF(day,c1.JobStart,c2.JobEnd) AND
   NOT EXISTS (SELECT * FROM ContractorsSchedules c6
               WHERE c1.JobStart+i BETWEEN c6.JobStart AND c6.JobEnd
               AND c6.ContractorId = c1.ContractorId))

In this HAVING clause, we have a subquery against the Pivot table that returns one row for each date in the candidate period. We have yet another subquery that filters out those rows that correspond to scheduled projects. If all dates in the candidate period are scheduled, the outer subquery will return no rows, and the candidate period will be accepted. Otherwise, the candidate period will be rejected. The use of the Pivot table and the nested subquery add to the complexity of this solution.

    Team LiB   Previous Section   Next Section