Team LiB   Previous Section   Next Section

5.1 Introduction

Representing temporal data in a database is probably one of the most unnatural concepts for a human to comprehend. The reason is that we use temporal information with very vague and undefined terms. For example, when we say that something should be done "until tomorrow," do we mean exactly until tomorrow midnight? Or, rather until tomorrow within normal working hours? Does "until tomorrow" mean that something is done through the last instant of today, or does it really mean that we must progress over the threshold to tomorrow? Similar questions puzzle database designers when working with temporal data. Humans often use temporal information within an implicit context, which can sometimes be very difficult to represent in program code.

In the next few sections, we'll talk about some concepts that you need to understand when working with temporal data. We'll also talk about the datatypes that SQL Server provides for use with temporal data.

5.1.1 Granularity

The concept of granularity is important when dealing with temporal data. Granularity refers to the smallest unit of time that you wish to deal with and that you wish to store. As you know, the data represented in a database is just an approximation of the real world. While time in our lives is linear, the time information stored in a database is discrete and accurate to a given granularity. If your granularity is a day, you'll store only dates. If your granularity is a second, you will store values that are accurate to the second.

You should be aware of two types of granularities when working with temporal data. The first type of granularity is commonly referred to as precision. Precision is explicit and is defined by the datatype you use to store temporal data. Precision refers to the finest granularity that a datatype can handle. For example, the precision of SQL Server's DATETIME type is 3.3 milliseconds. Every DATETIME value really represents a specific millisecond point in time. For example:

INSERT INTO ContractorsSchedules
  (JobID, ContractorID, JobStart)
  VALUES ('RF10023','Jenny','2001-10-18')

(1 row(s) affected)

SELECT JobID, ContractorID, JobStart
FROM ContractorsSchedules
WHERE ContractorID='Jenny';

JobID      ContractorID JobStart                    
---------- ------------ --------------------------- 
RF10023    Jenny        2001-10-18 00:00:00.000

In this example, only a date is inserted into the database. Look, however, at the date as it is retrieved. You'll see that it now contains a time component resolved down to the millisecond.

There is a trap in SQL that you must watch for when working with dates. It's easy to be drawn into the assumption that when you insert a date into a database, you have stored a value that encompasses the full day. In fact, due to the manner in which temporal data is physically stored, you are storing just a single point in the day. Misunderstanding this issue can be a source of great confusion.

The second type of granularity is a bit more subtle. It's the granularity that you care about, as opposed to what is actually stored. For example, if you are interested in events accurate up to one second, your granularity is set to one second, regardless of the precision your datatype allows. You should be aware of, and define clear rules for, the granularity used in all temporal operations within a database.

Dealing with this second type of granularity can be challenging. You'll usually need additional programming to bridge the gap between the granularity you want at a business-rule level and the granularity (precision) of the datatypes that you are using. It's all too easy to compare two data values that happen to use different granularities, only to get an erroneous result.

Let's look at an example to demonstrate a common error made by SQL programmers. A programmer designed software for an online library reservation system in which a reader can make reservations for books. The programmer used the DATETIME datatype to store the dates:

CREATE TABLE LibraryReservations(
   BookId CHAR(10),
   UserId CHAR(10),
   ReservedFrom DATETIME,
   ReservedTo DATETIME
)

The programmer created an online interface that allows users to add new reservations or check for a book's availability. Users can specify only dates when they make a reservation, or they can specify times along with those dates. Say that one user reserves a book from November 5, 2001 until November 6, 2001. The system might record the reservation using an INSERT, such as the following:

INSERT INTO LibraryReservations
   (BookId, UserId, ReservedFrom, ReservedTo)
   VALUES ('XF101','Jeff','2001-11-5','2001-11-6')

Now, let's say that a second user checks to see whether the same book is available from November 6, 2001, 15:00 onward. To implement this check, the programmer uses the following query to see whether the desired time falls into any of the reservation intervals for the book:

SELECT BookId FROM LibraryReservations
WHERE BookId='XF101' AND 
   'Nov 6 2001 15:00' BETWEEN ReservedFrom AND ReservedTo

If you run through this scenario and execute this query, you'll get a result that is incorrect from a business perspective. Why? Because the programmer didn't understand and take into account SQL Server's handling of default time values. Let's look at the actual row inserted when the first user reserved the book:

SELECT *
FROM LibraryReservations
WHERE BookId='XF101' and UserId='Jeff'

BookId     UserId     ReservedFrom                ReservedTo                  
---------- ---------- --------------------------- --------------------------- 
XF101      Jeff       2001-11-05 00:00:00.000     2001-11-06 00:00:00.000

Notice the time values associated with each date? Do you see that 2001-11-06 15:00 falls outside the reservation range? The first user intended to keep the book for the entire day, but the programmer didn't take that into account. Instead, the programmer allowed the database to supply the default time value, which represents the beginning of the day, not the end.

The solution to the problem illustrated by the previous example is in two parts. First, you need to strongly enforce the business rules that you've defined for temporal data in your application. Second, you may need to write code that protects you against problems. We'll talk about this second aspect in some of our recipes.

5.1.2 Temporal Datatypes in Transact-SQL

Transact-SQL uses two datatypes to store temporal data: DATETIME and SMALLDATETIME. Both store date and time data. The differences are only in the precision and the range of values available and come about because of differences in the storage capacities of the two datatypes.

DATETIME is a higher granularity datatype. It uses 8 bytes for each instance. 4 of those bytes are used to represent the date, and the other 4 are used to represent milliseconds past midnight. The DATETIME datatype can store values from January 1, 1753 through December 31, 9999, which is more than enough for business purposes.

A lower granularity datatype is SMALLDATETIME. It uses only 4 bytes of storage, and it can only handle dates between January 1, 1900 and June 6, 2079. In addition, the granularity of SMALLDATETIME is only to the minute. You can't represent seconds and milliseconds using SMALLDATETIME like you can using DATETIME.

Contrary to what you might think, there is no native way in Transact-SQL to store date and time data separately — you have to store both types of information within the same two datatypes. As you might imagine, this can be a source of a great confusion if you don't use the types carefully.

A common misconception with respect to temporal datatypes is that the TIMESTAMP datatype represents a third type that can be used for temporal data. The fact that there is a CURRENT_TIMESTAMP function to return the current date and time only adds to the confusion. The name is misleading. TIMESTAMP is not a temporal datatype. TIMESTAMP is a binary datatype used to version stamp rows in a table with unique, 8-byte, binary numbers. You cannot use TIMESTAMP for temporal operations. To further confuse things, the TIMESTAMP datatype in Transact-SQL is not the same as the TIMESTAMP datatype defined in the ANSI SQL-92 standard. A SQL-92 TIMESTAMP is actually similar to the Transact-SQL DATETIME datatype.

5.1.3 Date and Time Input

With Transact-SQL you can use a variety of input formats for date and time values. However, if you aren't careful, this can become a major source of confusion and errors. We strongly recommend you use only one standardized format for all date/time values that are input to, and output from, SQL Server. This doesn't mean your users must be limited to that one format. Your frontend tools can use any presentation format you desire, but it'll be less confusing if those tools, in turn, use only one format to communicate with the database. It is probably best to use an international standard date format such as ISO 8601, which we are going to use in all our recipes.

ISO 8601 defines international standard date notation as YYYY-MM-DD HH:MM:SS.MMM. Such a date can be used in a Transact-SQL string and is converted automatically from a string to a date value without the need to use explicit conversion functions. You don't need to use the full notation. You can use any subset of leading parts; for example, use the notation YYYY-MM-DD if you are working only with dates, not times. It is convenient that SQL Server can convert an ISO 8601 date automatically since it avoids readability problems resulting from explicit use of CONVERT or CAST functions.

For example, if you want to compare a date parameter in a query with a constant, you can use the simplified ISO notation:

WHERE '2001-2-2 17:04' < JobStart

The server will cast the string into a date automatically. It's not necessary to use the CONVERT function:

WHERE CONVERT(DATETIME,'Feb 2 2001 5:04PM') < JobStart

As you can see, the version of the query that doesn't use the CONVERT function is the more readable of the two.

Another advantage of the ISO 8601 date format is that it ensures the internationality of constant values. For example, the date '2/3/2001' can't be correctly interpreted without some additional information. You need to know whether the European or U.S. convention for ordering the day and month applies. Without that information, it is not clear whether we are talking about March 2, 2001 or February 3, 2001.

5.1.4 Temporal Types

When working with date and time data, you'll find that there are three basic types of temporal data to be concerned with:

Instant

A single point in time

Duration

An interval or length of time

Period

An interval of time that begins at a specific instant

In literature, you'll find different names for these types, names such as event and intervals, but those names all indicate one of the three basic types listed here.

Don't confuse the temporal types we've listed here with datatypes you use in a particular SQL dialect. These temporal types represent real-world uses for time and date data; they transcend any specific database implementation.

5.1.4.1 Instants

An instant, also called an event, is a single point in time with reference to a specific granularity. If the granularity of a problem is in terms of days, an instant ends up being a specific date. If the granularity is in terms of hours, an instant is defined as a specific hour of a specific day. For example, a calendar maker is only interested in dates and would note that the millennium started on January 1, 2001.

In Transact-SQL, instants are represented with a value stored in one of the two available datatypes: DATETIME or SMALLDATETIME. For example:

...
ContractStart SMALLDATETIME,
...
InvoiceIssued DATETIME,
...

There are several functions applicable to instants:

  • GETDATE

  • CURRENT_TIMESTAMP

  • GETUTCDATE

  • DATENAME

  • DATEPART

  • DAY

  • MONTH

  • YEAR

The GETDATE and CURRENT_TIMESTAMP functions provide equivalent functionality. GETDATE exists for historical reasons, while CURRENT_TIMESTAMP follows the SQL standard. Both functions simply return the current instant at the time of their invocation. The resolution of these functions is in milleseconds.

The GETUTCDATE function is of a similar nature; however, it returns the current UTC (Coordinated Universal Time) time resolved to the millisecond. You should be aware that using this function requires great confidence that the base system is installed properly and that date modifications (such as switching to daylight saving time) are handled correctly. In general, we do not recommend using this function unless absolutely necessary.

The DATENAME function returns a named portion of a temporal value. For example, you could use DATENAME to return the name of the month in which a given date falls. Again, the validity of this function is subject to specific settings on the server, particularly the language setting. Consequently, using DATENAME can be confusing. In our recipes for this chapter, we give an alternative solution to the use of DATENAME that is a bit more stable and that is not affected by any settings on the server.

The remaining functions in the list — DATEPART, DAY, MONTH, and YEAR — are all variations on the same theme and return different parts of a temporal value. For example, the YEAR function returns the year in which a given date falls.

5.1.4.2 Duration

A duration, sometimes called an interval, is a distance measure of time. For example, when you say that that a meeting lasts for "one hour," the "one hour" represents a duration. If you add a duration to an event, you get a second event; if you add one hour to the instant that the meeting starts, you end up with the instant on which the meeting ends. We use durations every day when we try to describe how long something lasted or when something happened relative to an instant in time. For example, the world 100-meter sprint record is 9.79 seconds (Maurice Greene, IAAF World Championships in Athletics, 1999) and dough for biscuits has to rise for 1 hour before baking.

Interestingly, durations have a direction; they can be in the future or in the past. The most natural way to present a duration that extends into the past is to assign it a negative value. This might sound a bit strange at first; however, it has significant advantages when using the value in applications.

For example, you could specify that you are going to arrive 5 minutes prior to a meeting that has yet to be scheduled. You could represent that as a duration of -5 minutes. Until the meeting is actually scheduled, you know only that you have a duration of -5 minutes. When the meeting is finally scheduled, you'll have a starting point and you can then think of your duration as a period. There will be more on periods in the next section.

Storing durations is not as easy as it sounds. You cannot use temporal datatypes for storing duration information because temporal datatypes are designed to represent only as instant in time. Usually, to record a duration, you use INTEGER or some other numeric datatype. For example:

...
ScoreInSeconds INTEGER,
...

Notice in this example that the granularity, or units, is implied. The score (from a track meet, possibly) is an INTEGER that represents a number of seconds, but the fact that the unit is seconds is not part of the variable's declaration. When working with durations, you must be aware of the granularity and you must both store and retrieve the information consistently. If your granularity is one second, you should internally represent all durations in seconds and only convert them to more human-readable forms such as HH:MM:SS when needed.

The one Transact-SQL function related to durations is the DATEDIFF function. DATEDIFF takes two date values, or instants, and returns the duration between them in terms of a granularity that you specify.

The DATETIME datatype represents a special case when computing durations between dates: you can use the minus sign between two DATETIME values, and you'll get the duration in days. If you want to calculate the duration in terms of any other granularity, you still need to use the DATEDIFF function where you explicitly specify the unit that you want to use in your result.

Eternity

Eternity is perhaps the longest interval of them all. A representation for eternity is not defined in Transact-SQL. The best approximation we can get is to use the maximum or minimal values in whatever date/time datatype we are using. For DATETIME, these values are '1753-1-1 00:00' and '9999-12-13 23:59'. Similarly, for SMALLDATETME, the values are '1900-1-1 00:00' and '2079-12-31 23:59'. Thus, to represent future eternity as a DATETIME value, we would use '9999-12-13 23:59'.

5.1.4.3 Periods

A period is a duration that is located in time. For a period, you need an instant and a duration from that instant. A typical example of a period is when you say that a meeting started at 8:00 a.m. and lasted for two hours. Another way of expressing a period is to specify two endpoints, such as saying that a meeting started at 8:00 a.m. and ended at 10:00 a.m. Regardless of how you express the period, you end up with a duration beginning at a specific point in time.

There are three possibilities for representing periods in a relational database:

Fully normalized

Insert one row to represent the beginning of a period and another row to mark the end of a period.

Margins

Record both the beginning and ending instants in a single table row.

Margin and duration

Record the beginning instant together with the duration in a single table row.

The fully normalized presentation is to represent each period with two rows in a table:

CREATE TABLE ...
   ...
   Instant DATETIME, 
   ...

Two instants, contained in two separate rows, represent the beginning and end of each period. This representation has some advantages over the others. For example, if you want to record events within periods, you might want to use the fully normalized representation. However, the code required can be complex, because the information about a period is spread over two or more rows. Consequently, this approach is usually not used.

The margins and margin and duration representations are conceptually equivalent. The difference between the two is more artistic than technical. The representation with margins requires two instants, one marking the beginning of the period and the other marking the end of the period:

CREATE TABLE ...
   ...
   JobStart DATETIME,
   JobEnd DATETIME,
   ...

The representation of a period with a margin and a duration is similar, requiring just one instant (it can be the beginning or the end) together with a duration in a given granularity:

CREATE TABLE ...
   ...
   JobStart DATETIME,
   Lasting INTEGER,
   ...

When specifying a period using an instant and a duration, you can use the DATEADD function to calculate the second instant based on the known instant and duration values. A benefit of this approach is that the use of DATEADD forces you to state the granularity of your interval value explicitly, so your code becomes somewhat self-documenting. Given an instant and an interval, the DATEADD function returns a DATETIME value as a result.

You can use the DATEDIFF function to derive the interval between two instants.

    Team LiB   Previous Section   Next Section