[ Team LiB ] |
12.12 When One Machine Is Not Enough for Your RDBMS DataBase and mod_perlImagine a scenario where you start your business as a small service providing a web site. After a while your business becomes very popular, and at some point you realize that it has outgrown the capacity of your machine. Therefore, you decide to upgrade your current machine with lots of memory, a cutting-edge, super-expensive CPU, and an ultra-fast hard disk. As a result, the load goes back to normal—but not for long. Demand for your services keeps on growing, and just a short time after you've upgraded your machine, once again it cannot cope with the load. Should you buy an even more powerful and very expensive machine, or start looking for another solution? Let's explore the possible solutions for this problem. A typical web service consists of two main software components: the database server and the web server. A typical user-server interaction consists of accepting the query parameters entered into an HTML form and submitted to the web server by a user, converting these parameters into a database query, sending it to the database server, accepting the results of the executed query, formatting them into a nice HTML page, and sending it to a user's Internet browser or another application that created the request (e.g., a mobile phone with WAP browsing capabilities). This process is depicted in Figure 12-9. Figure 12-9. Typical user-server interactionThis schema is known as a three-tier architecture in the computing world. In a three-tier architecture, you split up several processes of your computing solution between different machines:
We are interested only in the second and the third tiers; we don't specify user machine requirements, since mod_perl is all about server-side programming. The only thing the client should be able to do is to render the generated HTML from the response, which any simple browser will do. 12.12.1 Server RequirementsLet's first look at what kind of software the web and database servers are, what they need to run fast, and what implications they have on the rest of the system software. The three important machine components are the hard disk, the amount of RAM, and the CPU type. Typically, the mod_perl server is mostly RAM-hungry, while the SQL database server mostly needs a very fast hard disk. Of course, if your mod_perl process reads a lot from the disk (a quite infrequent phenomenon) you will need a fast disk too. And if your database server has to do a lot of sorting of big tables and do lots of big table joins, it will need a lot of RAM too. If we specified average virtual requirements for each machine, that's what we'd get. An "ideal" mod_perl machine would have:
An "ideal" database server machine would have:
12.12.2 The ProblemWith the database and the web server on the same machine, you have conflicting interests. During peak loads, Apache will spawn more processes and use RAM that the database server might have been using, or that the kernel was using on its behalf in the form of a cache. You will starve your database of resources at the time when it needs those resources the most. Disk I/O contention produces the biggest time issue. Adding another disk won't cut I/O times, because the database is the only thing that does I/O—mod_perl processes have all their code loaded in memory (we are talking about code that does pure Perl and SQL processing). Thus, it's clear that the database is I/O- and CPU-bound (it's RAM-bound only if there are big joins to make), while mod_perl is mostly CPU- and memory-bound. There is a problem, but it doesn't mean that you cannot run the application and the web servers on the same machine. There is a very high degree of parallelism in modern PC architecture. The I/O hardware is helpful here. The machine can do many things while a SCSI subsystem is processing a command or the network hardware is writing a buffer over the wire. If a process is not runnable (that is, it is blocked waiting for I/O or something else), it is not using significant CPU time. The only CPU time that will be required to maintain a blocked process is the time it takes for the operating system's scheduler to look at the process, decide that it is still not runnable, and move on to the next process in the list. This is hardly any time at all. If there are two processes, one of which is blocked on I/O and the other of which is CPU-bound, the blocked process is getting 0% CPU time, the runnable process is getting 99.9% CPU time, and the kernel scheduler is using the rest. 12.12.3 The SolutionThe solution is to add another machine, which allows a setup where both the database and the web server run on their own dedicated machines. This solution has the following advantages:
It also has the following disadvantages:
12.12.4 Three Machine ModelSince we are talking about using a dedicated machine for each server, you might consider adding a third machine to do the proxy work; this will make your setup even more flexible, as it will enable you to proxypass all requests not just to one mod_perl-running box, but to many of them. This will enable you to do load balancing if and when you need it. Generally, the proxy machine can be very light when it serves just a little traffic and mainly proxypasses to the mod_perl processes. Of course, you can use this machine to serve the static content; the hardware requirement will then depend on the number of objects you have to serve and the rate at which they are requested. Figure 12-10 illustrates the three machine model. Figure 12-10. A proxy machine, machine(s) with mod_perl-enabled Apache, and the database server machine |
[ Team LiB ] |