Team LiB   Previous Section   Next Section

7.4 Importing Data into a Live System

7.4.1 Problem

You need to import data from an external source, and your users need to have full access to the target tables even while the import is taking place. Furthermore, you need a mechanism for validating the imported data before it is stored in the production tables.

7.4.2 Solution

To avoid contaminating the operational system with bad data, we will design our import process around the use of a buffer table. This buffer table will have a structure similar to the system's operational table, but with extra columns to support error handling and the import process. Data will be loaded into the buffer table. A stored procedure will then be invoked to validate each row in the buffer table. Rows that pass validation tests will be loaded into the operational table. Rows that fail validation will be flagged as errors so that they can be corrected later.

7.4.2.1 Step 1: Create buffer tables

Create a buffer table for the table that is being loaded by the import process. The following BookstoreBuffer table is the buffer table for the Bookstore table in the Bookstore application that we will be loading in this recipe:

CREATE TABLE BookstoreBuffer (
   Id UNIQUEIDENTIFIER DEFAULT NEWID(  ),
   BookId INTEGER,
   Name CHAR(40),
   Quantity INTEGER,
   Price DECIMAL(10,2),
   Type CHAR(20),
   Status CHAR(1) DEFAULT 'I'
)

The Status column is used to hold the status of imported rows. The Id column forms the primary key, providing a unique identifier for each row that can later be used with the validation procedure. Table 7-2 lists the values that will be used for the Status column.

Table 7-2. Buffer table status values

Value

Status

Description

I

Imported

The row has been imported into the buffer table, but has not yet been validated and loaded into the operational table.

A

Added

The row has been validated, and the new information has been added to the operational table.

R

Rejected

The row has failed validation and has, consequently, been rejected.

All newly inserted rows should be marked with an 'I' to indicate that they have been imported, but not yet validated and added to the operational table. This can be done either through triggers or by the code that loads the data. The next section shows the procedure to validate and add the newly imported data to the Stock table.

7.4.2.2 Step 2: Build validation procedures

Create a stored procedure for the table being loaded that validates each newly imported row in the associated buffer table. Rows that pass validation should be used to update the operational table. Rows that fail validation should be flagged as rejections. The following ImportRow procedure reads from the BookstoreBuffer table and updates the Bookstore table:

CREATE PROCEDURE ImportRow @RowId UNIQUEIDENTIFIER 
AS BEGIN
   DECLARE @BookId INTEGER
   DECLARE @Name CHAR(40)
   DECLARE @Quantity INTEGER
   DECLARE @Price DECIMAL(10,2)
   DECLARE @Type CHAR(20)

   DECLARE @checkFlag integer

   SELECT @BookId=BookId,
          @Name=Name,
          @Quantity=Quantity,
          @Price=Price,
          @Type=Type
      FROM BookstoreBuffer 
      WHERE Id=@RowId

   /* Place validation code here. Set @checkFlag to 0
   for good records, and set it to 1 for bad records.
   */
   IF @quantity < 0
      SELECT @checkFlag=1
   ELSE 
      SELECT @checkFlag=0
   
   /* If the row passed validation, update the operational table.
   */
   IF @checkFlag=0 BEGIN
      UPDATE Bookstore 
         SET 
            Name=@Name,
            Quantity=@Quantity,
            Price=@Price,
            Type=@Type
         WHERE bookID=@bookId
      IF @@rowcount=0
         INSERT INTO Bookstore(BookId, Name, Quantity, Price, Type)
            VALUES(@BookId, @Name, @Quantity, @Price, @Type)
      UPDATE BookstoreBuffer SET Status='A' WHERE Id=@rowID
   END ELSE
      UPDATE BookstoreBuffer SET Status='R' WHERE Id=@rowID
END

Steps 1 and 2 obviously only need to be performed once. Steps 3 and 4 should be used whenever new data needs to be imported.

7.4.2.3 Step 3: Insert rows

Insert the rows to be imported into the buffer table. Do this using a custom program or use a generic import utility. Please note that the procedure shown in this recipe is designed to validate and update a single row at the time. The programmer writing the load program needs to generate a new row ID for each row using the NEWID( ) function, insert that row into the buffer table, and then run the validation procedure.

A simple extension of this technique is to write a validation procedure that processes all newly imported rows in the buffer table. You'll see an example of such a procedure in the next recipe.

7.4.2.4 Step 4: Check for rejected data

Check the buffer tables for rows that have a status value of R. These are rejected rows that could not be inserted into the system. Use a query such as the following:

SELECT * FROM BookstoreBuffer WHERE status='R' 

This query returns all the rows from the BookstoreBuffer table that were not successfully inserted into the Bookstore table. Once you've identified the problem data, you can correct it, rerun the ImportRow procedure for those rows, and check for errors again.

7.4.3 Discussion

The buffer-table framework shown in this recipe can be used to import any type of data from an external system. A significant advantage is that you can delegate the actual import into the database to outside programmers, and yet, you retain control over the data that ends up being inserted into the operational tables. The key to that, of course, is to give the outside programmers access only to the buffer table while you retain control over the procedure used to move data from the buffer table to the operational table.

Because the validation procedure, ImportRow in this recipe, only loads valid data into the operational table, you can run the import process without restricting user access to the target table. If there are referential integrity issues to deal with, you can code the validation procedure to deal with those as well.

A key issue to think about when implementing this type of solution is how you want to deal with rows that have been imported, validated, and loaded into the production table. The procedure shown in this recipe keeps those rows around, but marks them with a status of A to show that they were accepted. The advantage to doing this is that you end up with a complete history of imported data in the Bookstore table. The disadvantage is that that history will end up consuming a large amount of disk space over time.

If you don't need to keep a history of accepted rows, you can delete each one after it is processed. With respect to the ImportRow procedure, the following code could be used:

IF @checkFlag=0 
      DELETE BookstoreBuffer WHERE Id=@id

Deleting each row one-by-one may result in poor performance. DELETE statements are usually more expensive than UPDATES. Because of that, we recommend that you do one mass delete of all accepted rows after the entire load process has been completed.

    Team LiB   Previous Section   Next Section