Team LiB   Previous Section   Next Section

7.3 Importing External Data

7.3.1 Problem

You want to import data from an external text file. The import can be done in a batch process, because users do not need access to the destination table while the import takes place.

7.3.2 Solution

Use one of SQL Server's built-in import mechanisms, such as the BULK INSERT statement, to import the data from the ASCII file. Be careful with respect to validation and error handling. Follow these steps:

  1. Make sure that the data to be imported is in an ASCII file.

  2. Filter the data for obvious inconsistencies.

  3. Import the data into your database using the BCP utility or BULK INSERT statement.

  4. Check for errors.

7.3.2.1 Step 1: Make sure that the data to be imported is in an ASCII file

Almost all import mechanisms today support importing from ASCII text files. This is good, because text files are the easiest to deal with when something goes wrong. You can use any text editor to inspect and debug the data. If your external data source is another database, you can export data from that database into a text file and then load from that text file into your target database.

7.3.2.2 Step 2: Filter the data for obvious inconsistencies

Use Perl, awk, or VBScript to filter and preprocess the data to be loaded. Keep in mind the datatypes of the columns that you are loading. Each datatype has its own set of common problems. Table 7-1 lists some of the common problems that you will encounter when loading data from an external source.

Table 7-1. Common data-import problems

Datatype

Possible problems

NUMERICAL

Check the decimal and digit-grouping symbols. European and U.S. standards are exactly opposite of each other. Make sure that your systems understand the decimal symbol that you use in the ASCII files. It's best to omit or discard digit-grouping symbols if possible. They add no value to the import process.

CHAR

Check the quote symbol and its interpretation by the database server. By default, SQL Server imports all characters between field delimiters.

DATE

There are many possible problems with the way dates are formatted, so it's best to transform dates into the universal YYYY-MM-DD HH:MM format. This format is recognized as a standard by the ANSI committee and is the default format for SQL Server.

INTEGER

Be cognizant of the size of an INTEGER in your database server. Most servers use 4 bytes to store INTEGER datatypes. If the numeric values that you are importing exceed the range of an INTEGER type ( -2,147,483,648 through 2,147,483,647), you should import them as NUMERICAL. You can also use an extended INTEGER if that type is available on your system.

Other possible issues include how fields are separated in the data file and the characters used to mark the end of a row and the end of a file. However, these types of issues are easy to detect, and you can usually deal with them early in the development process.

7.3.2.3 Step 3: Import the data into your database

Import the data using one of SQL Server's import mechanisms. One of the easiest ways to import data from a text file is to use the BULK INSERT statement as shown in the following example:

BULK INSERT Northwind.dbo.Orders
   FROM 'C:\orders\orders.txt'
   WITH (FIELDTERMINATOR='|',TABLOCK)
7.3.2.4 Step 4: Check for errors

Checking for errors, and then dealing with them, is the hardest part of any import process. If your import system returns any error codes or other information that can be used to detect problems, be sure to take advantage of those features.

A quick trick that you can use to determine if a load was successful or not is also to take an initial count of the rows in the table being loaded. Do this before loading any data. Also count the lines in the ASCII file that you are loading. Import the data and count the rows in the table again. Calculate the difference between the row counts before and after the import. If the difference in row counts is the same as the line count from the ASCII file, you can assume that all rows have been inserted and that no rows have been rejected. Be careful with this technique. It's only valid when you can be certain no other users are modifying the data in the table that you are loading.

7.3.3 Discussion

The SQL language itself does not accommodate the importing of data into a database. In general, there are two approaches to the problem of doing an import. One approach is to use an external program that connects to the server as a client, reads the data from the ASCII file, and stores it into the database using INSERT statements. The other approach is for the database vendor to implement proprietary commands for importing or exporting data. SQL Server's BULK INSERT statement is an example of such a command. Proprietary commands such as these usually follow the general form of SQL statements, but they are not standardized across vendor platforms.

    Team LiB   Previous Section   Next Section