only for RuBoard - do not distribute or recompile |
Whatever architecture you use, the focus of this book lies at the point where your application talks to the database. As a database programmer, you need to worry about how to get data from and send it to your database. As we mentioned earlier, the tool to do that is generally some sort of database API. Any API, however, requires a basic understanding of managing a connection, the transactions under that connection, and the processing of the data associated with those transactions.
The starting point of your database interaction is in making a connection. The details behind what exactly constitutes a connection vary from API to API. Nevertheless, making a connection is basically establishing some sort of link between your code and the database. The variation comes in the form of logical and physical connections. Under some APIs, a connection is physical; i.e., a network link is established. Other APIs, however, may not establish a physical link until long after you make a connection, to ensure that no network traffic takes place until you actually need the connection.
The details about whether a connection is logical or physical should not generally concern a database programmer. The important thing is that once a connection is established, you can use that connection to interact with the database.
Once you are done with your connection, you need to close it and free up any resources it may have used. In a long-running application such as an Internet daemon process, a badly written application can eat up database resources until it locks up the system.
Part of cleaning up after yourself involves proper error handling. Some programming languages make it easy for you to remember to handle exceptional conditions (network failure, duplicate keys on insert, SQL syntax errors, etc.); but regardless of your language of choice, you must be aware of the error conditions that can arise from a given API call and act appropriately for each exceptional situation.
You talk to the database in the form of transactions.[1] A database transaction is one or more database statements that must be executed together, or not at all. A bank account transfer is a good example of a complex transaction. In short, an account transfer is actually two separate events: a debit of one account and a credit to another. Should the database crash after the debit but before the credit, the application should be able to back out of the debit. A database transaction enables a programmer to mark when a transaction begins, when it ends, and what should happen if one of the pieces of the transaction fails.
[1] Even if you are using a version of MySQL without support for transactions, each statement you send to the database can, in a sense, be thought of as an individual transaction. You simply have no option to abort or package multiple statements together in a complex transaction.
Until recently, MySQL had no support for transactions. In other words, when you executed an SQL statement under old versions of MySQL, it took effect immediately. This behavior is still the default for MySQL. Newer versions of MySQL, however, support the ability to use transactions with certain tables in the database. Specifically, the table must use a transaction-safe table format. Currently, MySQL supports two transaction-safe table types: BDB (Berkeley DB) and InnoDB. Instructions for configuring MySQL to use these databases can be found on the MySQL web site.
In Chapter 3, we described the MySQL syntax for managing transactions from the MySQL client command line. Managing transactions from within applications is often very different. In general, each API will provide a mechanism for beginning, committing, and rolling back transactions. If it does not, you can likely follow the command-line SQL syntax to get the desired effect.
Managing transactions may seem simple, but there are many issues you need to consider when using transactions in a multiuser environment. First of all, transactions come with a heavy price in terms of performance. MySQL did not originally support transactions because MySQL's goal was to provide a fast database engine. Transactions seriously impact database performance. To understand how this works, you need to have a basic understanding of transaction isolation levels.
A transaction isolation level basically determines what other people see when you are in the middle of a transaction. To understand transaction isolation levels, however, you first need to understand a few common terms:
A dirty read occurs when one transaction views the uncommitted changes of another transaction. If the original transaction rolls back its changes, the one that read the data is said to have "dirty" data.
A repeatable read occurs when one transaction always reads the same data from the same query no matter how many times the query is made or how many changes other transactions make to the rows read by the first transaction. In other words, a transaction that mandates repeatable reads will not see the committed changes made by another transaction. An application needs to start a new transaction to see those changes.
A phantom read deals with changes occurring in other transactions that would result in the new rows matching your transaction's WHERE clause. Consider, for example, a situation in which you have a transaction that reads all accounts with a balance of less than $100. Your transaction performs two reads of that data. Between the two reads, another transaction adds a new account to the database with no balance. That account will now match your query. If your transaction isolation allows phantom reads, you will see the new "phantom" row. If it does not allow phantom reads, you will see the same set of rows each time.
MySQL supports the following transaction isolation levels:
The transaction allows dirty, nonrepeatable, and phantom reads.
The transaction does not allow dirty reads, but allows nonrepeatable and phantom reads.
The transaction allows committed, repeatable reads and phantom reads. Nonrepeatable reads are not allowed.
The transaction allows only committed, repeatable reads. Phantom reads are specifically not allowed.
As you climb the transaction isolation chain, from no transactions to serializable transactions, you decrease the performance of your application. You therefore need to balance your data integrity needs with your performance needs. In general, READ COMMITTED is as high as an application wants to go, except in a few cases.
One mechanism of getting the performance of READ UNCOMMITTED and the data integrity of READ COMMITTED is to make a row's primary key the normal primary key plus a timestamp reflecting the time in milliseconds when the row was last updated. When an application performs an update on the underlying row in the database, it updates that timestamp but uses the old one in the WHERE clause:
UPDATE ACCOUNT SET BALANCE = 5.00, LAST_UPDATE_TIME = 996432238000 WHERE ACCOUNT_ID = 5 AND LAST_UPDATE_TIME = 996432191119
If this transaction has dirty data, the update will fail and throw an error. The application can then requery the database for the new data.
only for RuBoard - do not distribute or recompile |