8.2 Choosing the Right Hardware
Sometimes the most expensive machine
is not the one that provides the best performance. Your demands on
the platform hardware are based on many aspects and affect many
components. Let's discuss some of them.
This discussion relies on the specific definitions of various
hardware and operating-system terms. Although you may be familiar
with the terms below, we have explicitly provided definitions to make
sure there is no ambiguity when we discuss the hardware strategies.
- Cluster
-
A group of machines connected together to perform one big
or many small computational tasks in a reasonable time. Clustering
can also be used to provide failover, where if one machine fails, its
processes are transferred to another without interruption of service.
And you may be able to take one of the machines down for maintenance
(or an upgrade) and keep your service running—the main server
simply will not dispatch the requests to the machine that was taken
down.
- Load balancing
-
Say that users are given the name of one of your machines, but
it cannot stand the heavy load. You can use a clustering approach to
distribute the load over a number of machines (which gives you the
advantages of clustering, too). The central server, which users
access initially when they type the name of your service into their
browsers, works as a dispatcher. It redirects requests to other
machines, and sometimes the central server also collects the results
and returns them to the users.
- Network Interface Card (NIC)
-
A hardware component that allows your machine to
connect to the network. It sends and receives packets. NICs come in
different speeds, varying from 10 MBps to 10 GBps and faster. The
most widely used NIC type is the one that implements the Ethernet
networking protocol.
- Random Access Memory (RAM)
-
The memory that you have in your
computer (comes in units of 8 MB, 16 MB, 64 MB, 256 MB, etc.).
- Redundant Array of Inexpensive Disks (RAID)
-
An array of physical disks, usually
treated by the operating system as one single disk, and often forced
to appear that way by the hardware. The reason for using RAID is
often simply to achieve a high data-transfer rate, but it may also be
to get adequate disk capacity or high reliability.
Redundancy means that the system is capable of
continued operation even if a disk fails. There are various types of
RAID arrays and several different approaches to implementing them.
Some systems provide protection against failure of more than one
drive and some ("hot-swappable")
systems allow a drive to be replaced without even stopping the OS.
8.2.1 Machine Strength Demands According to Expected Site Traffic
If you are building a fan site and you want to amaze your friends
with a mod_perl guestbook, any old 486 machine could do it. But if
you are in a serious business, it is very important to build a
scalable server. If your service is successful and becomes popular,
the traffic could double every few days, and you should be ready to
add more resources to keep up with the demand. While we can define
the web server scalability more precisely, the important thing is to
make sure that you can add more power to your web server(s) without
investing much additional money in software development (you will
need a little software effort to connect your servers, if you add
more of them). This means that you should choose hardware and OSes
that can talk to other machines and become part of a cluster.
On the other hand, if you prepare for a lot of traffic and buy a
monster to do the work for you, what happens if your service
doesn't prove to be as successful as you thought it
would be? Then you've spent too much money, and
meanwhile faster processors and other hardware components have been
released, so you lose.
Wisdom and prophecy, that's all it takes. :)
8.2.2 A Single Strong Machine Versus Many Weaker Machines
Let's start with a claim that a
four-year-old processor
is still very powerful and can be put to good use. Now
let's say that for a given amount of money you can
probably buy either one new, very strong machine or about 10 older
but very cheap machines. We claim that with 10 old machines connected
into a cluster, by deploying load balancing, you will be able to
serve about five times more requests than with a single new machine.
Why is that? Generally the performance improvement on a new machine
is marginal, while the price is much higher. Ten machines will do
faster disk I/O than one single machine, even if the new disk is
quite a bit faster. Yes, you have more administration overhead, but
there is a chance that you will have it anyway, for in a short time
the new machine you have just bought might not be able to handle the
load. Then you will have to purchase more equipment and think about
how to implement load balancing and web server filesystem
distribution anyway.
Why are we so convinced? Look at the busiest services on the
Internet: search engines, webmail servers, and the like—most of
them use a clustering approach. You may not always notice it, because
they hide the real implementation details behind proxy servers, but
they do.
8.2.3 Getting a Fast Internet Connection
You have the best hardware you can get, but the service is
still crawling. What's wrong? Make sure you have a
fast Internet connection—not necessarily as fast as your ISP
claims it to be, but as fast as it should be. The ISP might have a
very good connection to the Internet but put many clients on the same
line. If these are heavy clients, your traffic will have to share the
same line and your throughput will suffer. Think about a dedicated
connection and make sure it is truly dedicated.
Don't trust the ISP, check it!
Another issue is connection latency. Latency defines the number of
milliseconds it takes for a packet to travel to its final
destination. This issue is really important if you have to do
interactive work (via ssh or a similar protocol)
on some remote machine, since if the latency is big (400+ ms)
it's really hard to work. It is less of an issue for
web services, since it influences only the first packet. The rest of
the packets arrive without any extra delay.
The idea of having a connection to "the
Internet" is a little misleading. Many web hosting
and colocation companies have large amounts of bandwidth but still
have poor connectivity. The public exchanges, such as MAE-East and
MAE-West, frequently become overloaded, yet many ISPs depend on these
exchanges.
Private peering is a solution used by the larger backbone operators.
No longer exchanging traffic among themselves at the public
exchanges, each implements private interconnections with each of the
others. Private peering means that providers can exchange traffic
much quicker.
Also, if your web site is of global interest, check that the ISP has
good global connectivity. If the web site is going to be visited
mostly by people in a certain country or region, your server should
probably be located there.
Bad connectivity can directly influence your
machine's performance. Here is a story one of the
developers told on the mod_perl mailing list:
What relationship has 10% packet loss on one upstream provider got to
do with machine memory ?
Yes.. a lot. For a nightmare week, the box was located downstream of a
provider who was struggling with some serious bandwidth problems of
his own... people were connecting to the site via this link, and
packet loss was such that retransmits and TCP stalls were keeping
httpd heavies around for much longer than normal.. instead of blasting
out the data at high or even modem speeds, they would be stuck at
1k/sec or stalled out... people would press stop and refresh, httpds
would take 300 seconds to timeout on writes to no-one.. it was a
nightmare. Those problems didn't go away till I moved the box to a
place closer to some decent backbones.
Note that with a proxy, this only keeps a lightweight httpd tied up,
assuming the page is small enough to fit in the buffers. If you are a
busy internet site you always have some slow clients. This is a
difficult thing to simulate in benchmark testing, though.
8.2.4 Tuning I/O Performance
If your service is I/O-bound (i.e., does a lot of read/write
operations to disk) you need a very fast disk, especially when
using a relational database.
Don't spend the money on a fancy video card and
monitor! A cheap card and a 14-inch monochrome monitor are perfectly
adequate for a web server—you will probably access it by
telnet or ssh most of the
time anyway. Look for hard disks with the best price/performance
ratio. Of course, ask around and avoid disks that have a reputation
for headcrashes and other disasters.
Consider RAID or similar systems when you want to improve
I/O's throughput (performance) and the reliability
of the stored data, and of course if you have an enormous amount of
data to store.
OK, you have a fast disk—so what's next? You
need a fast disk controller. There may be a controller embedded on
your computer's motherboard. If the controller is
not fast enough, you should buy a faster one. Don't
forget that it may be necessary to disable the original controller.
8.2.5 How Much Memory Is Enough?
How much RAM do you need? Nowadays,
chances are that you will
hear: "Memory is cheap, the more you buy the
better." But how much is enough? The answer is
pretty straightforward: you do not want your machine to
swap! When the
CPU needs to write something into memory, but memory is already full,
it takes the least frequently used memory pages and swaps them out to
disk. This means you have to bear the time penalty of writing the
data to disk. If another process then references some of the data
that happens to be on one of the pages that has just been swapped
out, the CPU swaps it back in again, probably swapping out some other
data that will be needed very shortly by some other process. Carried
to the extreme, the CPU and disk start to thrash hopelessly in
circles, without getting any real work done. The less RAM there is,
the more often this scenario arises. Worse, you can exhaust swap
space as well, and then your troubles really start.
How do you make a decision? You know the highest rate at which your
server expects to serve pages and how long it takes on average to
serve one. Now you can calculate how many server processes you need.
If you know the maximum size to which your servers can grow, you know
how much memory you need. If your OS supports memory sharing, you can
make best use of this feature by preloading the modules and scripts
at server startup, so you will need less memory than you have
calculated.
Do not forget that other essential system processes need memory as
well, so you should not only plan for the web server but also take
into account the other players. Remember that requests can be queued,
so you can afford to let your client wait for a few moments until a
server is available to serve it. Most of the time your server will
not have the maximum load, but you should be ready to bear the peaks.
You need to reserve at least 20% of free memory for peak situations.
Many sites have crashed a few moments after a big scoop about them
was posted and an unexpected number of requests suddenly arrived. If
you are about to announce something cool, be aware of the possible
consequences.
8.2.6 Getting a Fault-Tolerant CPU
Make sure that the CPU is operating within its specifications. Many
boxes are shipped with incorrect settings for CPU clock speed, power
supply voltage, etc. Sometimes a if cooling fan is not fitted, it may
be ineffective because a cable assembly fouls the fan blades. Like
faulty RAM, an overheating processor can cause all kinds of strange
and unpredictable things to happen. Some CPUs are known to have bugs
that can be serious in certain circumstances. Try not to get one of
them.
8.2.7 Detecting and Avoiding Bottlenecks
You might use the most expensive
components
but still get bad performance. Why? Let me introduce an annoying
word: bottleneck.
A machine is an aggregate of many components. Almost any one of them
may become a bottleneck. If you have a fast processor but a small
amount of RAM, the RAM will probably be the bottleneck. The processor
will be underutilized, and it will often be waiting for the kernel to
swap the memory pages in and out, because memory is too small to hold
the busiest pages.
If you have a lot of memory, a fast processor, and a fast disk, but a
slow disk controller, the disk controller will be the bottleneck. The
performance will still be bad, and you will have wasted money.
A slow NIC can cause a bottleneck as well and make the whole service
run slowly. This is a most important component, since web servers are
much more often network-bound than they are disk-bound (i.e., they
have more network traffic than disk utilization).
8.2.8 Solving Hardware Requirement Conflicts
It may happen that the combination of software components you
find yourself using gives rise to conflicting requirements for the
optimization of tuning parameters. If you can separate the components
onto different machines you may find that this approach (a kind of
clustering) solves the problem, at much less cost than buying faster
hardware, because you can tune the machines individually to suit the
tasks they should perform.
For example, if you need to run a relational database engine and a
mod_perl server, it can be wise to put the two on different machines,
since an RDBMS needs a very fast disk while mod_perl processes need
lots of memory. Placing the two on different machines makes it easy
to optimize each machine separately and satisfy each software
component's requirements in the best way.
|