8.2 Choosing the Right Hardware

Sometimes the most expensive machine is not the one that provides the best performance. Your demands on the platform hardware are based on many aspects and affect many components. Let's discuss some of them.

This discussion relies on the specific definitions of various hardware and operating-system terms. Although you may be familiar with the terms below, we have explicitly provided definitions to make sure there is no ambiguity when we discuss the hardware strategies.

Cluster: A group of machines connected together to perform one big or many small computational tasks in a reasonable time. Clustering can also be used to provide failover, where if one machine fails, its processes are transferred to another without interruption of service. And you may be able to take one of the machines down for maintenance (or an upgrade) and keep your service running—the main server simply will not dispatch the requests to the machine that was taken down.
Load balancing: Say that users are given the name of one of your machines, but it cannot stand the heavy load. You can use a clustering approach to distribute the load over a number of machines (which gives you the advantages of clustering, too). The central server, which users access initially when they type the name of your service into their browsers, works as a dispatcher. It redirects requests to other machines, and sometimes the central server also collects the results and returns them to the users.
Network Interface Card (NIC): A hardware component that allows your machine to connect to the network. It sends and receives packets. NICs come in different speeds, varying from 10 MBps to 10 GBps and faster. The most widely used NIC type is the one that implements the Ethernet networking protocol.
Random Access Memory (RAM): The memory that you have in your computer (comes in units of 8 MB, 16 MB, 64 MB, 256 MB, etc.).
Redundant Array of Inexpensive Disks (RAID): An array of physical disks, usually treated by the operating system as one single disk, and often forced to appear that way by the hardware. The reason for using RAID is often simply to achieve a high data-transfer rate, but it may also be to get adequate disk capacity or high reliability. Redundancy means that the system is capable of continued operation even if a disk fails. There are various types of RAID arrays and several different approaches to implementing them. Some systems provide protection against failure of more than one drive and some ("hot-swappable") systems allow a drive to be replaced without even stopping the OS.

8.2.1 Machine Strength Demands According to Expected Site Traffic

If you are building a fan site and you want to amaze your friends with a mod_perl guestbook, any old 486 machine could do it. But if you are in a serious business, it is very important to build a scalable server. If your service is successful and becomes popular, the traffic could double every few days, and you should be ready to add more resources to keep up with the demand. While we can define the web server scalability more precisely, the important thing is to make sure that you can add more power to your web server(s) without investing much additional money in software development (you will need a little software effort to connect your servers, if you add more of them). This means that you should choose hardware and OSes that can talk to other machines and become part of a cluster.

On the other hand, if you prepare for a lot of traffic and buy a monster to do the work for you, what happens if your service doesn't prove to be as successful as you thought it would be? Then you've spent too much money, and meanwhile faster processors and other hardware components have been released, so you lose.

Wisdom and prophecy, that's all it takes. :)

8.2.2 A Single Strong Machine Versus Many Weaker Machines

Let's start with a claim that a four-year-old processor is still very powerful and can be put to good use. Now let's say that for a given amount of money you can probably buy either one new, very strong machine or about 10 older but very cheap machines. We claim that with 10 old machines connected into a cluster, by deploying load balancing, you will be able to serve about five times more requests than with a single new machine.

Why is that? Generally the performance improvement on a new machine is marginal, while the price is much higher. Ten machines will do faster disk I/O than one single machine, even if the new disk is quite a bit faster. Yes, you have more administration overhead, but there is a chance that you will have it anyway, for in a short time the new machine you have just bought might not be able to handle the load. Then you will have to purchase more equipment and think about how to implement load balancing and web server filesystem distribution anyway.

Why are we so convinced? Look at the busiest services on the Internet: search engines, webmail servers, and the like—most of them use a clustering approach. You may not always notice it, because they hide the real implementation details behind proxy servers, but they do.

8.2.3 Getting a Fast Internet Connection

You have the best hardware you can get, but the service is still crawling. What's wrong? Make sure you have a fast Internet connection—not necessarily as fast as your ISP claims it to be, but as fast as it should be. The ISP might have a very good connection to the Internet but put many clients on the same line. If these are heavy clients, your traffic will have to share the same line and your throughput will suffer. Think about a dedicated connection and make sure it is truly dedicated. Don't trust the ISP, check it!

Another issue is connection latency. Latency defines the number of milliseconds it takes for a packet to travel to its final destination. This issue is really important if you have to do interactive work (via ssh or a similar protocol) on some remote machine, since if the latency is big (400+ ms) it's really hard to work. It is less of an issue for web services, since it influences only the first packet. The rest of the packets arrive without any extra delay.

The idea of having a connection to "the Internet" is a little misleading. Many web hosting and colocation companies have large amounts of bandwidth but still have poor connectivity. The public exchanges, such as MAE-East and MAE-West, frequently become overloaded, yet many ISPs depend on these exchanges.

Private peering is a solution used by the larger backbone operators. No longer exchanging traffic among themselves at the public exchanges, each implements private interconnections with each of the others. Private peering means that providers can exchange traffic much quicker.

Also, if your web site is of global interest, check that the ISP has good global connectivity. If the web site is going to be visited mostly by people in a certain country or region, your server should probably be located there.

Bad connectivity can directly influence your machine's performance. Here is a story one of the developers told on the mod_perl mailing list:

What relationship has 10% packet loss on one upstream provider got to
do with machine memory ?

Yes.. a lot. For a nightmare week, the box was located downstream of a
provider who was struggling with some serious bandwidth problems of
his own... people were connecting to the site via this link, and
packet loss was such that retransmits and TCP stalls were keeping
httpd heavies around for much longer than normal.. instead of blasting
out the data at high or even modem speeds, they would be stuck at
1k/sec or stalled out...  people would press stop and refresh, httpds
would take 300 seconds to timeout on writes to no-one.. it was a
nightmare.  Those problems didn't go away till I moved the box to a
place closer to some decent backbones.

Note that with a proxy, this only keeps a lightweight httpd tied up,
assuming the page is small enough to fit in the buffers.  If you are a
busy internet site you always have some slow clients.  This is a
difficult thing to simulate in benchmark testing, though.

8.2.4 Tuning I/O Performance

If your service is I/O-bound (i.e., does a lot of read/write operations to disk) you need a very fast disk, especially when using a relational database. Don't spend the money on a fancy video card and monitor! A cheap card and a 14-inch monochrome monitor are perfectly adequate for a web server—you will probably access it by telnet or ssh most of the time anyway. Look for hard disks with the best price/performance ratio. Of course, ask around and avoid disks that have a reputation for headcrashes and other disasters.

Consider RAID or similar systems when you want to improve I/O's throughput (performance) and the reliability of the stored data, and of course if you have an enormous amount of data to store.

OK, you have a fast disk—so what's next? You need a fast disk controller. There may be a controller embedded on your computer's motherboard. If the controller is not fast enough, you should buy a faster one. Don't forget that it may be necessary to disable the original controller.

8.2.5 How Much Memory Is Enough?

How much RAM do you need? Nowadays, chances are that you will hear: "Memory is cheap, the more you buy the better." But how much is enough? The answer is pretty straightforward: you do not want your machine to swap! When the CPU needs to write something into memory, but memory is already full, it takes the least frequently used memory pages and swaps them out to disk. This means you have to bear the time penalty of writing the data to disk. If another process then references some of the data that happens to be on one of the pages that has just been swapped out, the CPU swaps it back in again, probably swapping out some other data that will be needed very shortly by some other process. Carried to the extreme, the CPU and disk start to thrash hopelessly in circles, without getting any real work done. The less RAM there is, the more often this scenario arises. Worse, you can exhaust swap space as well, and then your troubles really start.

How do you make a decision? You know the highest rate at which your server expects to serve pages and how long it takes on average to serve one. Now you can calculate how many server processes you need. If you know the maximum size to which your servers can grow, you know how much memory you need. If your OS supports memory sharing, you can make best use of this feature by preloading the modules and scripts at server startup, so you will need less memory than you have calculated.

Do not forget that other essential system processes need memory as well, so you should not only plan for the web server but also take into account the other players. Remember that requests can be queued, so you can afford to let your client wait for a few moments until a server is available to serve it. Most of the time your server will not have the maximum load, but you should be ready to bear the peaks. You need to reserve at least 20% of free memory for peak situations. Many sites have crashed a few moments after a big scoop about them was posted and an unexpected number of requests suddenly arrived. If you are about to announce something cool, be aware of the possible consequences.

8.2.6 Getting a Fault-Tolerant CPU

Make sure that the CPU is operating within its specifications. Many boxes are shipped with incorrect settings for CPU clock speed, power supply voltage, etc. Sometimes a if cooling fan is not fitted, it may be ineffective because a cable assembly fouls the fan blades. Like faulty RAM, an overheating processor can cause all kinds of strange and unpredictable things to happen. Some CPUs are known to have bugs that can be serious in certain circumstances. Try not to get one of them.

8.2.7 Detecting and Avoiding Bottlenecks

You might use the most expensive components but still get bad performance. Why? Let me introduce an annoying word: bottleneck.

A machine is an aggregate of many components. Almost any one of them may become a bottleneck. If you have a fast processor but a small amount of RAM, the RAM will probably be the bottleneck. The processor will be underutilized, and it will often be waiting for the kernel to swap the memory pages in and out, because memory is too small to hold the busiest pages.

If you have a lot of memory, a fast processor, and a fast disk, but a slow disk controller, the disk controller will be the bottleneck. The performance will still be bad, and you will have wasted money.

A slow NIC can cause a bottleneck as well and make the whole service run slowly. This is a most important component, since web servers are much more often network-bound than they are disk-bound (i.e., they have more network traffic than disk utilization).

8.2.8 Solving Hardware Requirement Conflicts

It may happen that the combination of software components you find yourself using gives rise to conflicting requirements for the optimization of tuning parameters. If you can separate the components onto different machines you may find that this approach (a kind of clustering) solves the problem, at much less cost than buying faster hardware, because you can tune the machines individually to suit the tasks they should perform.

For example, if you need to run a relational database engine and a mod_perl server, it can be wise to put the two on different machines, since an RDBMS needs a very fast disk while mod_perl processes need lots of memory. Placing the two on different machines makes it easy to optimize each machine separately and satisfy each software component's requirements in the best way.

[ Team LiB ]