7.4 Importing Data into a Live System7.4.1 ProblemYou need to import data from an external source, and your users need to have full access to the target tables even while the import is taking place. Furthermore, you need a mechanism for validating the imported data before it is stored in the production tables. 7.4.2 SolutionTo avoid contaminating the operational system with bad data, we will design our import process around the use of a buffer table. This buffer table will have a structure similar to the system's operational table, but with extra columns to support error handling and the import process. Data will be loaded into the buffer table. A stored procedure will then be invoked to validate each row in the buffer table. Rows that pass validation tests will be loaded into the operational table. Rows that fail validation will be flagged as errors so that they can be corrected later. 7.4.2.1 Step 1: Create buffer tablesCreate a buffer table for the table that is being loaded by the import process. The following BookstoreBuffer table is the buffer table for the Bookstore table in the Bookstore application that we will be loading in this recipe: CREATE TABLE BookstoreBuffer ( Id UNIQUEIDENTIFIER DEFAULT NEWID( ), BookId INTEGER, Name CHAR(40), Quantity INTEGER, Price DECIMAL(10,2), Type CHAR(20), Status CHAR(1) DEFAULT 'I' ) The Status column is used to hold the status of imported rows. The Id column forms the primary key, providing a unique identifier for each row that can later be used with the validation procedure. Table 7-2 lists the values that will be used for the Status column.
All newly inserted rows should be marked with an 'I' to indicate that they have been imported, but not yet validated and added to the operational table. This can be done either through triggers or by the code that loads the data. The next section shows the procedure to validate and add the newly imported data to the Stock table. 7.4.2.2 Step 2: Build validation proceduresCreate a stored procedure for the table being loaded that validates each newly imported row in the associated buffer table. Rows that pass validation should be used to update the operational table. Rows that fail validation should be flagged as rejections. The following ImportRow procedure reads from the BookstoreBuffer table and updates the Bookstore table: CREATE PROCEDURE ImportRow @RowId UNIQUEIDENTIFIER AS BEGIN DECLARE @BookId INTEGER DECLARE @Name CHAR(40) DECLARE @Quantity INTEGER DECLARE @Price DECIMAL(10,2) DECLARE @Type CHAR(20) DECLARE @checkFlag integer SELECT @BookId=BookId, @Name=Name, @Quantity=Quantity, @Price=Price, @Type=Type FROM BookstoreBuffer WHERE Id=@RowId /* Place validation code here. Set @checkFlag to 0 for good records, and set it to 1 for bad records. */ IF @quantity < 0 SELECT @checkFlag=1 ELSE SELECT @checkFlag=0 /* If the row passed validation, update the operational table. */ IF @checkFlag=0 BEGIN UPDATE Bookstore SET Name=@Name, Quantity=@Quantity, Price=@Price, Type=@Type WHERE bookID=@bookId IF @@rowcount=0 INSERT INTO Bookstore(BookId, Name, Quantity, Price, Type) VALUES(@BookId, @Name, @Quantity, @Price, @Type) UPDATE BookstoreBuffer SET Status='A' WHERE Id=@rowID END ELSE UPDATE BookstoreBuffer SET Status='R' WHERE Id=@rowID END Steps 1 and 2 obviously only need to be performed once. Steps 3 and 4 should be used whenever new data needs to be imported. 7.4.2.3 Step 3: Insert rowsInsert the rows to be imported into the buffer table. Do this using a custom program or use a generic import utility. Please note that the procedure shown in this recipe is designed to validate and update a single row at the time. The programmer writing the load program needs to generate a new row ID for each row using the NEWID( ) function, insert that row into the buffer table, and then run the validation procedure.
7.4.2.4 Step 4: Check for rejected dataCheck the buffer tables for rows that have a status value of R. These are rejected rows that could not be inserted into the system. Use a query such as the following: SELECT * FROM BookstoreBuffer WHERE status='R' This query returns all the rows from the BookstoreBuffer table that were not successfully inserted into the Bookstore table. Once you've identified the problem data, you can correct it, rerun the ImportRow procedure for those rows, and check for errors again. 7.4.3 DiscussionThe buffer-table framework shown in this recipe can be used to import any type of data from an external system. A significant advantage is that you can delegate the actual import into the database to outside programmers, and yet, you retain control over the data that ends up being inserted into the operational tables. The key to that, of course, is to give the outside programmers access only to the buffer table while you retain control over the procedure used to move data from the buffer table to the operational table. Because the validation procedure, ImportRow in this recipe, only loads valid data into the operational table, you can run the import process without restricting user access to the target table. If there are referential integrity issues to deal with, you can code the validation procedure to deal with those as well. A key issue to think about when implementing this type of solution is how you want to deal with rows that have been imported, validated, and loaded into the production table. The procedure shown in this recipe keeps those rows around, but marks them with a status of A to show that they were accepted. The advantage to doing this is that you end up with a complete history of imported data in the Bookstore table. The disadvantage is that that history will end up consuming a large amount of disk space over time. If you don't need to keep a history of accepted rows, you can delete each one after it is processed. With respect to the ImportRow procedure, the following code could be used: IF @checkFlag=0 DELETE BookstoreBuffer WHERE Id=@id Deleting each row one-by-one may result in poor performance. DELETE statements are usually more expensive than UPDATES. Because of that, we recommend that you do one mass delete of all accepted rows after the entire load process has been completed. |