13.5 Tuning After Deployment
Tuning does not necessarily
end at the development stage. For many applications such as agent
applications, services, servlets and servers, multiuser applications,
enterprise systems, etc., there needs to be constant monitoring of
the application performance after deployment to ensure that no
degradation takes place. In this section, I discuss tuning the
deployed application. This is mainly relevant to enterprise systems
that are being administered. Shrinkwrapped or similar software is
normally tuned the same way as before deployment, using standard
profiling tools.
Monitoring
the application is the primary tuning activity after deployment. The
application should be built with hooks that enable tools to connect
to it and gather statistics and response
times. The application should be constantly
monitored, and all performance logs retained. Monitoring should
record as many parameters as possible throughout the system, though
clearly you want to avoid monitoring so much that the performance of
the running application is compromised by a significant amount. Of
course, almost any act of measuring a system affects performance. But
the advantage of having performance logs normally pays off
enormously, and a few percent decrease in performance should be
acceptable.
Individual records in the performance logs should include at least
the following six categories:
Time (including offset time from a reference server)
User identifier
Transaction identifier
Application name, type, class, or group
Software component or subsystem
Hardware resource
A standard set of performance logs should be used to give a
background system measurement and kept as a reference. Other logs can
be compared against that standard. Periodically, the standard should
be regenerated, as most enterprise applications change their
performance characteristics over time. Ideally, the standard logs can
be automatically compared against the current logs, and any
significant change in behavior is automatically identified and causes
an alert to be sent to the administrators. Trends away from the
standard should also trigger a notification; sometimes performance
degrades slowly but consistently because of a gradually depleting
resource.
Administrators should note every single change to the
system: every patch, every upgrade, every
configuration change, etc. These changes are the source of most
performance problems in production. Patches are cheaper short-term
fixes than upgrades, but they usually add to the complexity of the
application and increase maintenance costs. Upgrades and rereleases
are more expensive in the short term, but cheaper overall.
Administrators should listen to users. Users are the most sensitive
barometer of application performance. However, you should
double-check users' assertions. A user may be wrong,
or might have hit a known system problem or temporary administrative
shutdown. Measure the performance yourself. Repeat the measurements
several times and take averages and variations. Ensure that caching
effects do not skew measurements of a reported problem.
When looking for reasons why performance may have changed, consider
any recent changes such as an increase in the number of users, other
applications added to the system, code changes on the client or
server, hardware changes, etc. In addition to user response time
measurements, look at where the distributed code is executing, what
volumes of data are being used, and where the code is spending most
of its time.
Many factors can easily give misleading or temporarily different
measurements to the application. Distributed garbage collection may
have cut in, system clocks may become unsynchronized, background
processes may be triggered, and relative processor power may change,
causing obscure effects. Consider if anyone else is using the
processors, and if so, what they are doing and why.
You need to differentiate between:
Occasional sudden slowness, e.g., from background processes starting
up
General slowness, perhaps reflecting that the application was not
tuned for the current load, or that the systems or networks are
saturated
A sudden slowdown that continues, often the result of a change to the
system
Each of these characteristic changes in performance indicates a
different set of problems.
|