5.1 The Performance Effects of Strings
Let's first look
at the advantages
of the String implementation:
Compilation
creates unique strings. At compile time, strings are resolved as far
as possible. This includes applying the concatenation operator and
converting other literals to strings. So "hi7" and
("hi"+7) both get resolved at compile time to the
same string, and are identical objects in the class string pool (see
Section 3.9.1.2). Compilers differ in their ability to
achieve this resolution. You can always check your compiler (e.g., by
decompiling some statements involving concatenation) and change it if
needed.
Because String objects are immutable, a substring
operation doesn't need to copy the entire underlying
sequence of characters. Instead, a
substring can use the same
char array as the original string and simply refer
to a different start point and endpoint in the
char array. This means that substring operations
are efficient, being both fast and conserving of memory; the extra
object is just a wrapper on the same underlying
char array with different pointers into that
array.
Strings have strong support for
internationalization. It would take a large effort to
reproduce the internationalization support for an alternative class.
The close relationship with
StringBuffers
allows Strings to reference the same
char array used by the
StringBuffer. This is a double-edged sword. For
typical practice, when you use a StringBuffer to
manipulate and append characters and data types, and then convert the
final result to a String, this works just fine.
The StringBuffer provides efficient mechanisms for
growing, inserting, appending, altering, and other types of
String manipulation. The resulting
String then efficiently references the same
char array with no extra character copying. This
is very fast and reduces the number of objects being used to a
minimum by avoiding intermediate objects. However, if the
StringBuffer object is subsequently altered, the
char array in that StringBuffer
is copied into a new char array that is now
referenced by the StringBuffer. The
String object retains the reference to the
previously shared char array. This means that
copying overhead can occur at unexpected points in the application.
Instead of the copying occurring at the toString(
) method call, as might be expected, any subsequent
alteration of the StringBuffer causes a new
char array to be created and an array copy to be
performed. To make the copying overhead occur at predictable times,
you could explicitly execute some method that makes the copying
occur, such as StringBuffer.setLength(
). This allows
StringBuffers to be reused with more predictable
performance.
The disadvantages of the String implementation
are:
Not being able to subclass String means that it is
not possible to add behavior to String for your
own needs.
The previous point means that all access must be through the
restricted set of currently available String
methods, imposing extra overhead.
The only way to increase the number of methods allowing efficient
manipulation of String characters is to copy the
characters into your own array and manipulate them directly, in which
case String is imposing an extra step and extra
objects you may not need.
char arrays are faster to process directly.
The tight coupling with StringBuffer can lead to
unexpectedly high memory usage. When StringBuffer.toString(
)
creates a String,
the current underlying array holds the string, regardless of the size
of the array (i.e., the capacity of the
StringBuffer). For example, a
StringBuffer with a capacity of 10,000 characters
can build a string of 10 characters. However, that 10-character
String continues to use a
10,000-char array to store the 10 characters. If
the StringBuffer is now reused to create another
10-character string, the StringBuffer first
creates a new internal 10,000-char array to build
the string with; then the new String also uses
that 10,000-char array to store the 10 characters.
Obviously, this process can continue indefinitely, using vast amounts
of memory where not expected.
The advantages of Strings can be summed up as ease
of use, internationalization support, and compatibility to existing
interfaces. Most methods expect a String object
rather than a char array, and
String objects are returned by many methods. The
disadvantage of Strings boils down to
inflexibility. With extra work, most things you can do with
String objects can be done faster and with less
intermediate object-creation overhead by using your own set of
char array manipulation methods.
For
most performance tuning, you pinpoint a bottleneck and make localized
changes to objects and methods that speed up that bottleneck. But
String tuning often involves converting to
char arrays, whereas you rarely come across
public methods or interfaces that deal in
char arrays. This makes it difficult to switch
between Strings and char arrays
in any localized way. The consequences are that you either have to
switch back and forth between Strings and
char arrays, or you have to make extensive
modifications that can reach across many application boundaries. I
have no easy solution for this problem. String
tuning can get messy. Sun recognizes that Strings
are not the optimal solution in many cases and has added a
CharSequence interface in JDK 1.4 that
String and other classes implement. New methods
have been added that operate on CharSequence
objects rather than requiring Strings. For
example, the regular expression classes accept
CharSequence objects. This
doesn't necessarily help your particular bottleneck,
and CharSequences still access the
char elements through a charAt(
) method, but it does at least increase the options
available for optimizing applications.
It is difficult to handle String
internationalization
capabilities using raw char arrays. But in many
cases, internationalized Strings form a specific
subset of String usage in an application, mainly
in the user interface, and that subset of Strings
rarely causes bottlenecks. You should differentiate between
Strings that need internationalization and those
that are simply processing characters, independent of language. These
latter Strings can be replaced for tuning with
char arrays.
Internationalization-dependent Strings are more
difficult to tune, and I provide some examples of tuning these later
in the chapter. Note also that internationalized
Strings can be treated as char
arrays for some types of processing without any problems; see Section 5.4.2 later
in this chapter.
|