14.1 Relational Databases and SQL
Although
one can certainly write an entire book on
relational databases, and another on SQL, the essentials of these
technologies are not hard to understand. A
database is a repository of data. A
relational database
organizes your data into tables. Consider the Northwind database
provided with Microsoft SQL Server 7, SQL Server 2000, and all
versions of Microsoft Access.
14.1.1 Tables, Records, and Columns
The Northwind
database describes a fictional company buying and selling food
products. The data for Northwind is divided into 13 tables, including
Customers, Employees, Orders, Order Details, Products, and so forth.
Every table in a relational database is organized into rows, where
each row represents a single record. The rows are organized into
columns. All the rows in a table have the same column structure. For
example, the Orders table has these columns:
OrderID, CustomerID,
EmployeeID, OrderDate, etc.
For any given order, you need to know the customer's
name, address, contact name, and so forth. You could store that
information with each order, but that would be very inefficient.
Instead, we use a second table called Customers, in which each row
represents a single customer. In the Customers table is a column for
the CustomerID. Each customer has a unique ID, and
that field is marked as the primary
key for that table. A primary key is the column
or combination of columns that uniquely identifies a record in a
given table.
ADO.NET is somewhat different from ADO. While learning how to
implement the new functionality found in ADO.NET, you are probably
going to keep asking yourself things like: "Where is
the MoveNext( ) method?" and
"How do I test for the end of
file?" In ADO.NET, record navigation works a bit
differently. DataTables don't have the same record
navigation methods as in ADO. The best way to get a grasp on how to
work with ADO.NET is to think of a DataTable as an array of rows.
Calling the MoveFirst( ) method in ADO.NET would
be the same as going to the first index of the array. Testing for the
end-of-file is the same as testing whether the current index matches
the array's upper bound. Want to set a bookmark for
a particular record? Just create a variable and assign it the index
of the current record—you don't
need a special BookMark property. Record navigation in ADO.NET is
just as easy as in ADO, perhaps even easier.
|
The Orders table uses the CustomerID as
a
foreign key. A foreign key
is a column (or combination of columns) that is a primary (or
otherwise unique) key from a different table. The Orders table uses
the CustomerID (the primary key used in the
Customers table) to identify which customer has placed the order. To
determine the address for the order, you can use the
CustomerID to look up the customer record in the
Customers table.
This use of foreign keys is particularly helpful in representing
one-to-many or many-to-one relationships between tables. By
separating information into tables that are linked by foreign keys,
you avoid having to repeat information in records. A single customer,
for example, can have multiple orders, but it is inefficient to place
the same customer information (name, phone number, credit limit, and
so on) in every order record. The process of removing redundant
information from your records and shifting it to separate tables is
called
normalization.
14.1.2 Normalization
Normalization not only makes your use of the database more efficient,
but also it reduces the likelihood of data corruption. If you kept
the customer's name both in the Customers table and
also in the Orders table, you would run the risk that a change in one
table might not be reflected in the other. Thus, if you changed the
customer's address in the Customers table, that
change might not be reflected in every row in the Orders table (and a
lot of work would be necessary to make sure that it was reflected).
By keeping only the CustomerID in Orders, you are
free to change the address in Customers, and the change is
automatically reflected for each order.
Just as C# programmers want the compiler to catch bugs at compile
time rather than at runtime, database programmers want the database
to help them avoid data corruption. The
compiler helps avoid bugs in C# by
enforcing the rules of the language (for example, you
can't use a variable you've not
defined). SQL Server and other modern relational databases avoid bugs
by enforcing constraints that you request. For example, the Customers
database marks the CustomerID as a primary key.
This creates a primary key constraint in the database, which ensures
that each CustomerID is unique. If you were to
enter a customer named Liberty Associates, Inc. with the
CustomerID of LIBE, and then tried to add Liberty
Mutual Funds with a CustomerID of LIBE, the
database would reject the second record because of the primary key
constraint.
14.1.3 Declarative Referential Integrity
Relational databases use Declarative
Referential
Integrity (DRI) to establish
constraints on the relationships among the various tables. For
example, you might declare a constraint on the Orders table that
dictates that no order can have a CustomerID
unless that CustomerID represents a valid record
in Customers. This helps avoid two types of mistakes. First, you
cannot enter a record with an invalid CustomerID.
Second, you cannot delete a Customer record if that
CustomerID is used in any order. The integrity of
your data and their relationships are thus protected.
14.1.4 SQL
The most popular language for querying and manipulating databases is
SQL, usually pronounced "sequel."
SQL is
a declarative language, as opposed to a procedural language, and it
can take a while to get used to working with a declarative language
when you are used to languages such as C#.
The heart of SQL is the
query. A query is a statement that returns a set
of records from the database.
For example, you might like to see all the
CompanyNames and CustomerIDs of
every record in the Customers table in which the
customer's address is in London. To do so, write:
Select CustomerID, CompanyName from Customers where city = 'London'
This returns the following six records as output:
CustomerID CompanyName
---------- ----------------------------------------
AROUT Around the Horn
BSBEV B's Beverages
CONSH Consolidated Holdings
EASTC Eastern Connection
NORTS North/South
SEVES Seven Seas Imports
SQL is capable of much more powerful queries. For example, suppose
the Northwinds manager would like to know what products were
purchased in July of 1996 by the customer "Vins et
alcools Chevalier." This turns out to be somewhat
complicated. The Order Details table knows the
ProductID for all the products in any given order.
The Orders table knows which CustomerIDs are
associated with an order. The Customers table knows the
CustomerID for a customer, and the Products table
knows the Product name for the ProductID. How do
you tie all this together? Here's the query:
select o.OrderID, productName
from [Order Details] od
join orders o on o.OrderID = od.OrderID
join products p on p.ProductID = od.ProductID
join customers c on o.CustomerID = c.CustomerID
where c.CompanyName = 'Vins et alcools Chevalier'
and orderDate >= '7/1/1996' and orderDate <= '7/31/1996'
This asks the database to get the OrderID and the
product name from the relevant tables. First, look at Order Details
(which we've called od for
short), then join that with the Orders table for every record in
which the OrderID in the Order Details table is
the same as the OrderID in the Orders table.
When you join two tables, you can say either "Get
every record that exists in either table" (this is
called an outer join), or
you can say, as I've done here,
"Get only those records that exist in both
tables" (called an inner
join). That is, an inner join states to get only
the records in Orders that match the records in Order Details by
having the same value in the OrderID field
(on o.Orderid
= od.Orderid).
|
SQL joins are inner joins by default. Writing join orders is the same
as writing inner join orders.
|
|
The SQL statement goes on to ask the database to create an inner join
with Products, getting every row in which the
ProductID in the Products table is the same as the
ProductID in the Order Details table.
Then create an inner join with customers for those rows where the
CustomerID is the same in both the Orders table
and the Customer table.
Finally, tell the database to constrain the results to only those
rows in which the CompanyName is the one you want,
and the dates are in July.
The collection of constraints finds only three records that match:
OrderID ProductName
----------- ----------------------------------------
10248 Queso Cabrales
10248 Singaporean Hokkien Fried Mee
10248 Mozzarella di Giovanni
This output shows that there was only one order (10248) in which the
customer had the right ID and in which the date of the order was July
1996. That order produced three records in the Order Details table,
and using the product IDs in these three records, you got the product
names from the Products table.
You can use SQL not only for searching for and retrieving data, but
also for creating, updating, and deleting tables, and generally
managing and manipulating both the content and the structure of the
database.
For a full explanation of SQL and tips on how to put it to best use,
I recommend Transact SQL
Programming, by Kline, Gould, and Zanevsky
(O'Reilly).
|