5.1 Introduction
Representing temporal data in a database is probably one
of the most unnatural concepts for a human to comprehend. The reason
is that we use temporal information with very vague and undefined
terms. For example, when we say that something should be done
"until tomorrow," do we mean
exactly until tomorrow midnight? Or, rather until tomorrow within
normal working hours? Does "until
tomorrow" mean that something is done through the
last instant of today, or does it really mean that we must progress
over the threshold to tomorrow? Similar questions puzzle database
designers when working with temporal data. Humans often use temporal
information within an implicit context, which can sometimes be very
difficult to represent in program code.
In the next few sections, we'll talk about some
concepts that you need to understand when working with temporal data.
We'll also talk about the datatypes that SQL Server
provides for use with temporal data.
5.1.1 Granularity
The concept of granularity is important when
dealing with temporal
data. Granularity refers to the smallest
unit of time that you wish to deal with and that you wish to store.
As you know, the data represented in a database is just an
approximation of the real world. While time in our lives is linear,
the time information stored in a database is discrete and accurate to
a given granularity. If your granularity is a day,
you'll store only dates. If your granularity is a
second, you will store values that are accurate to the second.
You should be aware of two types of granularities when working with
temporal data. The first type of granularity is commonly
referred
to as precision. Precision is explicit and is
defined by the datatype you use
to store
temporal data. Precision refers to the finest granularity that a
datatype can handle. For example, the precision of SQL
Server's DATETIME type is
3.3
milliseconds. Every DATETIME value really represents a specific
millisecond point in time. For example:
INSERT INTO ContractorsSchedules
(JobID, ContractorID, JobStart)
VALUES ('RF10023','Jenny','2001-10-18')
(1 row(s) affected)
SELECT JobID, ContractorID, JobStart
FROM ContractorsSchedules
WHERE ContractorID='Jenny';
JobID ContractorID JobStart
---------- ------------ ---------------------------
RF10023 Jenny 2001-10-18 00:00:00.000
In this example, only a date is inserted into the database. Look,
however, at the date as it is retrieved. You'll see
that it now contains a time component resolved down to the
millisecond.
There is a trap in SQL that you must watch
for when working
with dates. It's easy to be drawn into the
assumption that when you insert a date into a database, you have
stored a value that encompasses the full day. In fact, due to the
manner in which temporal data is physically stored, you are storing
just a single point in the day. Misunderstanding this issue can be a
source of great confusion.
The second type of granularity is a bit more subtle.
It's the granularity that you care about, as opposed
to what is actually stored. For example, if you are interested in
events accurate up to one second, your granularity is set to one
second, regardless of the precision your datatype allows. You should
be aware of, and define clear rules for, the granularity used in all
temporal operations within a database.
Dealing with this second type of granularity can be challenging.
You'll usually need additional programming to bridge
the gap between the granularity you want at a business-rule level and
the granularity (precision) of the datatypes that you are using.
It's all too easy to compare two data values that
happen to use different granularities, only to get an erroneous
result.
Let's look at an example to demonstrate a common
error made by SQL programmers. A programmer designed software for an
online library reservation system in which a reader can make
reservations for books. The programmer used the DATETIME datatype to
store the dates:
CREATE TABLE LibraryReservations(
BookId CHAR(10),
UserId CHAR(10),
ReservedFrom DATETIME,
ReservedTo DATETIME
)
The programmer created an online interface that allows users to add
new reservations or check for a book's availability.
Users can specify only dates when they make a reservation, or they
can specify times along with those dates. Say that one user reserves
a book from November 5, 2001 until November 6, 2001. The system might
record the reservation using an INSERT, such as the following:
INSERT INTO LibraryReservations
(BookId, UserId, ReservedFrom, ReservedTo)
VALUES ('XF101','Jeff','2001-11-5','2001-11-6')
Now, let's say that a second user checks to see
whether the same book is available from November 6, 2001, 15:00
onward. To implement this check, the programmer uses the following
query to see whether the desired time falls into any of the
reservation intervals for the book:
SELECT BookId FROM LibraryReservations
WHERE BookId='XF101' AND
'Nov 6 2001 15:00' BETWEEN ReservedFrom AND ReservedTo
If you run through this scenario and execute this query,
you'll get a result that is incorrect from a
business perspective. Why? Because the programmer
didn't understand and take into account SQL
Server's handling of default time values.
Let's look at the actual row inserted when the first
user reserved the book:
SELECT *
FROM LibraryReservations
WHERE BookId='XF101' and UserId='Jeff'
BookId UserId ReservedFrom ReservedTo
---------- ---------- --------------------------- ---------------------------
XF101 Jeff 2001-11-05 00:00:00.000 2001-11-06 00:00:00.000
Notice the time values associated with each date? Do you see that
2001-11-06 15:00 falls outside the reservation
range? The first user intended to keep the book for the entire day,
but the programmer didn't take that into account.
Instead, the programmer allowed the database to supply the default
time value, which represents the beginning of
the day, not the end.
The solution to the problem illustrated by the previous example is in
two parts. First, you need to strongly enforce the business rules
that you've defined for temporal data in your
application. Second, you may need to write code that protects you
against problems. We'll talk about this second
aspect in some of our recipes.
5.1.2 Temporal Datatypes in Transact-SQL
Transact-SQL uses two datatypes to
store
temporal data: DATETIME and SMALLDATETIME. Both store date and time
data. The differences are only in the precision and the range of
values available and come about because of differences in the storage
capacities of the two datatypes.
DATETIME is a higher
granularity
datatype. It uses 8 bytes for each instance. 4 of those bytes are
used to represent the date, and the other 4 are used to represent
milliseconds past midnight. The DATETIME datatype can store values
from January 1, 1753 through December 31, 9999, which is more than
enough for business purposes.
A lower granularity datatype is SMALLDATETIME. It uses only 4 bytes
of storage, and it can only handle dates between January 1, 1900 and
June 6, 2079. In addition, the granularity of SMALLDATETIME is only
to the minute. You can't represent seconds and
milliseconds using SMALLDATETIME like you can using DATETIME.
Contrary to what you might think, there is no native way in
Transact-SQL to store date and time data separately — you have to
store both types of information within the same two datatypes. As you
might imagine, this can be a source of a great confusion if you
don't use the types carefully.
A common misconception with respect
to temporal
datatypes is that the TIMESTAMP datatype represents a third type that
can be used for temporal data. The fact that there is a
CURRENT_TIMESTAMP function to
return
the current date and time only adds to the confusion. The name is
misleading. TIMESTAMP is not a temporal
datatype. TIMESTAMP is a binary datatype used to version stamp rows
in a table with unique, 8-byte, binary numbers. You cannot use
TIMESTAMP for temporal operations. To further confuse things, the
TIMESTAMP datatype in Transact-SQL is not the same as the TIMESTAMP
datatype defined in the ANSI SQL-92 standard. A SQL-92 TIMESTAMP is
actually similar to the Transact-SQL DATETIME datatype.
5.1.3 Date and Time Input
With Transact-SQL you can use a variety of input formats for date and time
values. However, if you aren't careful, this can
become a major source of confusion and errors. We strongly recommend
you use only one standardized format for all date/time values that
are input to, and output from, SQL Server. This
doesn't mean your users must be limited to that one
format. Your frontend tools can use any presentation format you
desire, but it'll be less confusing if those tools,
in turn, use only one format to communicate with the database. It is
probably best to use an international standard date format such as
ISO 8601, which we are going to use in all our recipes.
ISO 8601 defines international standard
date notation as YYYY-MM-DD
HH:MM:SS.MMM. Such a date can be used in a Transact-SQL string and is
converted automatically from a string to a date value without the
need to use explicit conversion functions. You don't
need to use the full notation. You can use any subset of leading
parts; for example, use the notation YYYY-MM-DD if you are working
only with dates, not times. It is convenient that SQL Server can
convert an ISO 8601 date automatically since it avoids readability
problems resulting from explicit
use
of CONVERT or CAST functions.
For example, if you want to compare a date parameter in a query with
a constant, you can use the simplified ISO notation:
WHERE '2001-2-2 17:04' < JobStart
The server will cast the string into a date automatically.
It's not necessary to use the CONVERT function:
WHERE CONVERT(DATETIME,'Feb 2 2001 5:04PM') < JobStart
As you can see, the version of the query that
doesn't use the CONVERT function is the more
readable of the two.
Another advantage of the ISO 8601 date format is that it ensures the
internationality of constant values. For example, the date
'2/3/2001' can't be correctly
interpreted without some additional information. You need to know
whether the European or U.S. convention for ordering the day and
month applies. Without that information, it is not clear whether we
are talking about March 2, 2001 or February 3, 2001.
5.1.4 Temporal Types
When working with date and time data, you'll
find that there
are three basic types of temporal data to be concerned with:
- Instant
-
A single point in time
-
Duration
-
An interval or length of time
- Period
-
An interval of time that begins at a specific
instant
In literature, you'll find different names for these
types, names such as event and
intervals, but those names all indicate one of
the three basic types listed here.
Don't confuse the temporal types
we've listed here with datatypes you use in a
particular SQL dialect. These temporal types represent real-world
uses for time and date data; they transcend any specific database
implementation.
5.1.4.1 Instants
An instant, also called an event, is a single
point in time with reference to a specific granularity. If the
granularity of a problem is in terms of days, an instant ends up
being a specific date. If the granularity is in terms of hours, an
instant is defined as a specific hour of a specific day. For example,
a calendar maker is only interested in dates and would note that the
millennium started on January 1, 2001.
In Transact-SQL, instants are represented
with a
value stored in one of the two available datatypes: DATETIME or
SMALLDATETIME. For example:
...
ContractStart SMALLDATETIME,
...
InvoiceIssued DATETIME,
...
There are several functions applicable to
instants:
GETDATE
CURRENT_TIMESTAMP
GETUTCDATE
DATENAME
DATEPART
DAY
MONTH
YEAR
The GETDATE and CURRENT_TIMESTAMP functions provide equivalent
functionality. GETDATE exists for
historical
reasons, while CURRENT_TIMESTAMP follows the SQL standard. Both
functions simply return the current instant at the time of their
invocation. The resolution of these functions is in milleseconds.
The GETUTCDATE function is of a
similar
nature; however, it returns the current UTC (Coordinated Universal
Time) time resolved to the millisecond. You should be aware that
using this function requires great confidence that the base system is
installed properly and that date modifications (such as switching to
daylight saving time) are handled correctly. In general, we do not
recommend using this function unless absolutely necessary.
The DATENAME function
returns
a named portion of a temporal value. For example, you could use
DATENAME to return the name of the month in which a given date falls.
Again, the validity of this function is subject to specific settings
on the server, particularly the language setting. Consequently, using
DATENAME can be confusing. In our recipes for this chapter, we give
an alternative solution to the use of DATENAME that is a bit more
stable and that is not affected by any settings on the server.
The remaining functions in the list — DATEPART, DAY, MONTH, and
YEAR — are all variations on the same theme and return different
parts of a temporal value. For example, the YEAR function returns the
year in which a given date falls.
5.1.4.2 Duration
A duration, sometimes
called an
interval, is a distance measure of time. For example, when you say
that that a meeting lasts for "one
hour," the "one
hour" represents a duration. If you add a duration
to an event, you get a second event; if you add one hour to the
instant that the meeting starts, you end up with the instant on which
the meeting ends. We use durations every day when we try to describe
how long something lasted or when something happened relative to an
instant in time. For example, the world 100-meter sprint record is
9.79 seconds (Maurice Greene, IAAF World Championships in Athletics,
1999) and dough for biscuits has to rise for 1 hour before baking.
Interestingly, durations have a direction; they can be in the future
or in the past. The most natural way to present a duration that
extends into the past is to assign it a negative value. This might
sound a bit strange at first; however, it has significant advantages
when using the value in applications.
For example, you could specify that you are going to arrive 5 minutes
prior to a meeting that has yet to be scheduled. You could represent
that as a duration of -5 minutes. Until the meeting is actually
scheduled, you know only that you have a duration of -5 minutes. When
the meeting is finally scheduled, you'll have a
starting point and you can then think of your duration as a period.
There will be more on periods in the next section.
Storing durations is not as easy
as
it sounds. You cannot use temporal datatypes for storing duration
information because temporal datatypes are designed to represent only
as instant in time. Usually, to record a duration, you use INTEGER
or some other numeric datatype. For
example:
...
ScoreInSeconds INTEGER,
...
Notice in this example that the granularity, or units, is implied.
The score (from a track meet, possibly) is an INTEGER that represents
a number of seconds, but the fact that the unit is seconds is not
part of the variable's declaration. When working
with durations, you must be aware of the granularity and you must
both store and retrieve the information consistently. If your
granularity is one second, you should internally represent all
durations in seconds and only convert them to more human-readable
forms such as HH:MM:SS when needed.
The one Transact-SQL function related to durations is
the
DATEDIFF function. DATEDIFF takes two date values, or instants, and
returns the duration between them in terms of a granularity that you
specify.
The DATETIME datatype represents
a
special case when computing durations between dates: you can use the
minus sign between two DATETIME values, and you'll
get the duration in days. If you want to calculate the duration in
terms of any other granularity, you still need to use the DATEDIFF
function where you explicitly specify the unit that you want to use
in your result.
Eternity is perhaps the longest interval of them all. A
representation for eternity is not defined in Transact-SQL. The best
approximation we can get is to use the maximum or minimal values in
whatever date/time datatype we are using. For DATETIME, these values
are '1753-1-1 00:00' and
'9999-12-13 23:59'. Similarly,
for SMALLDATETME, the values are '1900-1-1
00:00' and '2079-12-31 23:59'.
Thus, to represent future eternity as a DATETIME value, we would use
'9999-12-13 23:59'.
|
5.1.4.3 Periods
A period is a duration that
is located in
time. For a period, you need an instant and a duration from that
instant. A typical example of a period is when you say that a meeting
started at 8:00 a.m. and lasted for two hours. Another way of
expressing a period is to specify two endpoints, such as saying that
a meeting started at 8:00 a.m. and ended at 10:00 a.m. Regardless of
how you express the period, you end up with a duration beginning at a
specific point in time.
There are three possibilities for representing periods in a
relational database:
- Fully normalized
-
Insert one row to represent the
beginning of a period and another
row to mark the end of a period.
- Margins
-
Record both the beginning and ending
instants in a single table row.
- Margin and duration
-
Record the beginning instant together
with the duration in a single
table row.
The fully normalized presentation is to represent each period with
two rows in a table:
CREATE TABLE ...
...
Instant DATETIME,
...
Two instants, contained in two separate rows, represent the beginning
and end of each period. This representation has some advantages over
the others. For example, if you want to record events within periods,
you might want to use the fully normalized representation. However,
the code required can be complex, because the information about a
period is spread over two or more rows. Consequently, this approach
is usually not used.
The margins and margin and
duration representations are conceptually equivalent. The
difference between the two is more artistic than technical. The
representation with margins requires two instants, one marking the
beginning of the period and the other marking the end of the period:
CREATE TABLE ...
...
JobStart DATETIME,
JobEnd DATETIME,
...
The representation of a period with a margin and a duration is
similar, requiring just one instant (it can be the beginning or the
end) together with a duration in a given granularity:
CREATE TABLE ...
...
JobStart DATETIME,
Lasting INTEGER,
...
When specifying a period using an instant and a duration, you can use
the DATEADD function to calculate the second instant based on the
known instant and duration values. A benefit of this approach is that
the use of DATEADD forces you to state the granularity of your
interval value explicitly, so your code becomes somewhat
self-documenting. Given an instant and an interval, the DATEADD
function returns a DATETIME value as a
result.
|
You can use the DATEDIFF function to derive the interval between two
instants.
|
|
|