3.1 Types of Data Structures
In this chapter, we plan to discuss the following
three major
types of data structures:
Lists
Stacks and queues
Arrays and matrices
Lists, stacks, and queues are all linear data structures; the term
linear referring
to the fact that, conceptually,
you're dealing with a set of elements arranged in
the form of a line. The use of SQL is well-suited to such structures
because they map easily onto the relational table paradigm. Arrays
and matrices, on the other hand, are multidimensional data
structures. You can use SQL to work with arrays and matrices, but
it's better-suited for linear structures.
3.1.1 Lists
Lists are the most common linear
data
structure in computing. In contrast to other programming languages,
lists are the easiest type of data structure to implement in SQL. A
list is a sequence of elements with a known order. A list is
unlimited in size and can shrink or grow on demand. Elements can be
added to a list, removed from a list, and you can query to see
whether a given element is in a list.
As you can see, the properties of a list are very similar
to
the properties of a SQL table, with the exception that a list has an
order to it. That is why lists are so easy to implement. SQL tables
are also unlimited in size, can be queried, and allow for entries to
be added (inserted) and removed (deleted).
In this chapter we use a SQL table with an index to define a list:
CREATE TABLE List (
Id INTEGER,
...
)
A list in SQL is, therefore, a table with at least two attributes.
One attribute is an index identifying the order of each element in
the list. Any other attributes are data-carrying attributes, meaning
that they hold the actual data in the list.
When you implement a list in SQL, it's very helpful
if your
list index is arithmetically
increasing. By arithmetically, we mean that your list index should
increase by 1 as new elements are appended to the list. We are going
to assume in this chapter's recipes that our lists
have arithmetically increasing indices.
All basic list operations
are directly implementable in SQL.
INSERT is used to add a new element to a list, DELETE is used to
remove an element from a list, and SELECT is used to query for
elements by value or by index. These operations are so
straightforward that we won't show recipes for them
in this chapter.
Lists are of particular interest in SQL, since they allow for pattern
matching and for cumulative aggregation queries. The most important
pattern-matching queries are for finding regions, runs, and
sequences. Having said that, let's take a look at
what those terms mean.
3.1.1.1 Cumulative aggregation queries
A cumulative aggregation query is similar to a
running aggregate
query, with only one difference. A running
aggregate is the type
produced by the GROUP BY clause in
conjunction with an aggregate function, and it generates one
aggregate value for each group of elements. Cumulative aggregates, on
the other hand, include all elements from the beginning of the list.
For example, you might produce a report showing sales in January,
sales in January and February, sales in January through March, and so
forth. While running aggregates are used mainly in statistics,
cumulative aggregates are used mainly in combinatorial problems.
3.1.1.2 Regions
A region is a continuous
pattern
in a list in which all values of the elements are the same. A typical
usage of such a pattern is when you look for empty slots in a list.
Let's say that you have a list where elements can be
either empty or full slots. If you are looking for five sequential
empty slots, you are actually looking for a region of that size. The
most typical example is when you build a warehouse application and
you have a triple-size container. To store it, you need to find an
empty slot that could fit three containers of regular size.
Therefore, you are looking for a region of size three.
3.1.1.3 Runs
A run is a continuous pattern in which
values
are increasing monotonically. Each value is larger than the previous
one. For example, say that you have a list of consecutively taken
measurements for a volcano. In a given period of time, you may find
that the temperature of the volcano were consistently rising. The
measurements in that period of time, each representing a higher
temperature than the one previous, would be considered a run.
3.1.1.4 Sequences
A sequence is a run in which
values form
an arithmetic progression with a difference of 1. Each value is
exactly one increment larger than the previous value. With respect to
the volcano example that we used to describe a run, a series of
temperature measurements in which each measurement was exactly 1
degree higher than the previous would represent a sequence.
3.1.2 Stacks and Queues
When building a software system, there is often a requirement to
serialize data that is being processed. In such cases, the following
special linear structures are particularly useful:
Stack
Queue
Priority queue
3.1.2.1 Stacks
A stack is a special kind of list
in which all addition
and removal of elements takes place on one side. A stack is a
last in, first out (LIFO)
structure. The LIFO concept
is one that accountants and computer programmers are equally likely
to understand. Each element that is added to a stack is put on top of
all the others. Only the topmost item is accessible at any given
time, and items in a stack are removed from the top down.
The position of an element in the stack is
maintained
with its index. The top of the stack is the element with the highest
index. To find the highest index, a simple query is needed:
SELECT MAX(Id) FROM Stack
Typical operations on a stack are the following:
- PUSH
-
To add a new element onto a stack
- POP
-
To remove the top element from the stack
- TOP
-
To find the topmost element
3.1.2.2 Queues
A queue is a list to which items are
added at the top and
from which items are removed at the bottom. Thus, a queue implements
a first in,
first out (FIFO) mechanism. In all other aspects, a queue
is exactly the same as a stack. Queues are probably the most used
data structure of all because they provide a robust serialization
mechanism.
Typical operations on queues are the following:
- TOP
-
To find the top element
- ENQUEUE
-
To add a new element to the queue
- DEQUEUE
-
To remove an element from the queue
In a multiclient application, you can use a queue as a temporary
repository for data. For example, several users might collect data
through forms on their computers and store the data for each form in
a queue. On the server side, you might have one or more interface
applications that DEQUEUE and process the sets of data ENQUEUED by
the client applications. Serialization is achieved because all client
applications place their data into the queue, and all server-side
applications pull that data out in the same order in which it was
placed into the queue.
3.1.2.3 Priority queues
A priority queue is a special kind of a
queue in which the top
element — the element that is next in line to be
DEQUEUED — is not the oldest one. A priority queue uses a
priority property to find the top candidate, thus implementing a
first in, priority out (FIPO) structure. No
matter what the input order, items are DEQUEUED in priority order.
The easiest mechanism for implementing a priority queue is to order
the elements in the queue by the value of one of their properties.
For example, given a queue of scheduling data, you might prioritize
based on time, with the most imminent events taking priority over
those that are farther out into the future. As
you'll see, it is very easy in SQL to implement
priority queues once you have the basic mechanisms for stacks and
queues in place.
3.1.3 Arrays and Matrices
As opposed to linear data structures, multidimensional data
structures are not very natural to SQL. In this chapter, we discuss
two types of multidimensional data structure:
It is possible to implement such structures
in SQL, but their manipulation soon becomes inefficient, and it is
probably worthwhile considering an external implementation where
feasible. If you need to perform an operation on arrays or matrices
and if you are using MS SQL 2000, consider the TABLE datatype. It
provides a convenient way to work with arrays; however, it is limited
only to stored procedures and can be used only as temporary storage.
3.1.3.1 Arrays
An array is a multidimensional data structure
that, rather than
having just one value, can have many values associated with it. An
array can have one or more dimensions. A one-dimensional array
allows you to store a list of values. A
two-dimensional array allows you to store a grid of values,
essentially a list of lists. A three-dimensional array allows you to
store a cube of values, and so forth. SQL programmers typically must
make some compromises when working with arrays to achieve reasonable
performance. The most widely used implementation of an array is to
have one column in a table store the index of dimension in an array
and have other columns store values for the element in question. In
other words, each row in an array table stores the coordinates of one
array element together with its values.
One of the biggest dangers when working with arrays is to break an
array's structure. Therefore, you have to build some
mechanism into your implementation to ensure that your array
structures remain consistent. Otherwise, operations with arrays are
fairly simple. The addition of a new element is as simple as an
INSERT with new coordinates. Likewise, the removal of an element is
as simple as executing a DELETE.
3.1.3.2 Matrices
A matrix is a special case of array. Matrices
have only two
dimensions, must be finite, and their indices must arithmetically
increase (no gaps). We show operations on matrices to demonstrate
that SQL can be used for algebraic operations. However, it is
unlikely that you will have much use for matrices in business
applications. In the recipes, we show you how to perform arithmetic
operations on matrices, how to transpose matrices, how to print
matrices, and more.
When working with matrices, SQL can be useful if you store a large
number of small matrices in one table. An advantage over other
languages is that SQL can easily perform an operation on many
elements of a matrix. Other programming languages require you to use
a FOR loop or some other kind of looping mechanism to step through
all elements of a matrix to perform the operation.
|