13.3 Analysis

The analysis phase of development encompasses a variety of activities that determine what functionality you are going to build into your application. These activities include:

Specifying what the application needs to do (e.g., compress files, display graphic files of type, etc.)
Identifying major functions and business areas (e.g., compression, display; targeted to the area of graphics files)
Planning generally how the application will work (e.g., read one or more files, use 2-Ronnies compression if possible, etc.)
Prioritizing subsections (e.g., the compression component must be completed but can use an alternative compression algorithm, the graphics types XYZ must be supported but the graphics types ABC may be dropped until later, etc.)
Deciding whether to build or buy (e.g., are there available beans or classes to handle compression and display? How much are they? How much will building our own cost? Do the purchasable ones provide all essential features?)
Documenting the requirements

The analysis phase does not usually specify either the structure of the application or the technology (e.g., you might specify that the application uses a database, but probably not which database or even which type of database). The analysis phase specifies what the application will do (and might do), not how it is done, except in the most general terms.

Here are major performance-tuning considerations during the analysis phase:

Determining general characteristics of objects, data, and users (e.g., number of objects in the application)
Specifying expected or acceptable performance boundaries (e.g., functionality X should take less than M seconds)
Identifying probable performance limitations from the determined specifications (e.g., function Y is an HTTP connection, and so is dependent on the quality of the network path and the availability of the server)
Eliminating any performance conflicts by extending, altering, or restating the specifications (e.g., the specification states query Z must always respond within N seconds, but this cannot be guaranteed without altering the specification to provide an alternative default result)

Performance goals should be an explicit part of the analysis and should form part of the specification. The analysis phase should include time to analyze the performance impacts of the requirements.

The general characteristics of the application can be determined by asking the following questions about the application:

How many objects will there be, and what are their sizes (average and distribution)? What is the total amount of data being manipulated, and how are the manipulations expected to be performed (database access, file access, object storage, etc.)?
What is a transaction for the application? If there are several types of transactions, define each type. Include details such as the number of objects created, deleted, or changed; the duration of the transactions (average and distribution); and expected transaction amounts (transactions per second), both per person and for the system as a whole. Define how data is accessed and queried for, and how often.
How many simultaneous users will use the application, and what level of concurrency is expected for those simultaneous users? (Are they accessing the same resources, and if so, how many resources and what type of access?)
What is the expected distribution of the application? This is, of course, mainly relevant for distributed applications. This applies back to the last point, but focuses on the distributed resources that are necessarily used simultaneously.

You can use the answers to these questions to provide an abstract model of the application. Applying this abstract model to a generalized computer architecture allows you to identify any performance problems. For example, if the application is a multiplayer game to be played across a network, a simple model of a network together with the objects (numbers and sizes) that need to be distributed, the number of users and their expected distributions, and possible patterns of play provide the information you need to identify whether the specified application can run over the network. If, after including safety factors, the network can easily cope with the traffic, that section of the application is validated. If the game is unplayable when you put in minimum bandwidths of 56K (typical modem connection) and latency (network communication response time) of 400 milliseconds, you need to reexamine the specifications.

This type of analysis is part of software performance engineering. The general technique for performance tuning prior to actually testing the code (i.e., testing at the analysis and design phases) is to predict the application performance based on the best available data.^[5] This technique is covered in detail in the book High Performance Client/Server by Chris Loosley and Frank Douglas (John Wiley & Sons).

^[5] This is a scientific technique referred to as "successive approximation by the application of empirically derived data." Another name for it is "educated guessing."

One of the most significant aspects to examine at the analysis phase is the expected performance gains and drawbacks of distributed computing. Distributing sections of applications always implies some performance drawback. After all, network communication is always slower than interprocess communication on the same machine, and interprocess communication is always slower than component-to-component communication within the same process. Good design usually emphasizes decoupling components, but good performance often requires close coupling. These are not always conflicting requirements, but you do need to bear in mind this potential conflict.

For distributed applications, distributed components should be coupled in such a way as to minimize the communication between those components. The goal is to limit the number of messages that need to be sent back and forth between components, as too many network message transfers can have a detrimental effect on performance. Components engaged in extended conversations over a network spend most of their time sitting idle, waiting for responses. For this type of situation, the network latency tends to dominate the performance.

A simple example, showing the huge difference that distribution performance can make to even a standalone applet, indicates how important this aspect is. You might have thought that a standalone applet does not need much analysis of its distributed components. Table 13-1 shows two development paths that might be followed and illustrates how ignoring performance at the analysis stage can lead to performance problems later.

Table 13-1. Contrasting development processes

Applet1 development

Applet2 development

Distribution analysis: Applet is distributed using a compressed JAR file.

Distribution analysis:

Applet2 is distributed using one or more compressed JAR files. Because the download time may be significant for the expected number of classes, the analysis indicates that the applet should be engineered from the ground up, with minimizing download time as a high priority. To this end, the specification is altered to state that a small entry point functionality of the applet, with a small isolated set of classes, will be downloaded initially to allow the applet to start as quickly as possible. This initial functionality should be designed to engage the user while the remainder of the applet is downloaded to the browser in the background. The applet could be downloaded in several sections, if necessary, to ensure the user's waiting time is kept to a minimum. A secondary priority is for the user to have no further explicit download waiting time.

Applet1 functional analysis: Similar for both.

Applet2 functional analysis: Similar for both.

Applet1 design: Simple.

Applet2 design: Requires careful thought about which classes require the presence of other classes.

Applet1 coding: Similar for both.

Applet2 coding: Similar for both.

Applet1 performance testing: Applet takes far too long to download. User testing indicates that 99% of users abandon the web page before download is complete and the applet can start. Unpacking the JAR file and having classes download individually makes the situation even worse. Project may be terminated, or a major (and costly) rewrite of the applet design may be undertaken to allow the applet to start faster at the user's desktop.

Applet2 performance testing: Applet downloads and starts in adequate time. Performance within the browser requires some rounds of tuning.

Table 13-1 shows how important performance prediction can be. The analysis on the right saves a huge amount on development costs. Of course, if not identified at the analysis phase, this aspect of performance may be picked up later in some other phase of development, but the further away from the analysis phase it is identified, the more expensive it is to correct.

Another consideration at the analysis stage is the number of features being specified. Sometimes "nice to have" features are thrown into the requirements at the analysis phase. Features seem to have an inverse relationship to performance: the more features there are, the worse the performance or the more effort is required to improve the performance. For good performance, it is always better to minimize the features in the requirements or, at the very least, to specify that the design should be extensible to incorporate certain nice-to-have features rather than to simply go ahead and include the features in the requirements.

One other important aspect that you should focus on during the analysis phase is the application's use of shared resources. Try to identify all shared resources and the performance costs associated with forcing unique access of shared resources. When the performance cost is shown to be excessive, you need to specify alternative mechanisms to allow the efficient use of the shared resource. For example, if several parts of the application may be simultaneously updating a file, then to avoid corruption, the updates may need to be synchronized. If this potentially locks parts of the application for too long, an alternative, such as journaling, might be specified. Journaling allows the different parts of the application to update separate dedicated log files, and these logs are reconciled by another asynchronous part of the application.