[ Team LiB ] |
3.9 Sun's Compiler and Runtime OptimizationsAs you can see from the previous sections, knowing how the compiler alters your code as it generates bytecodes is important for performance tuning. Some compiler optimizations can be canceled out if you write your code so that the compiler cannot apply its optimizations. In this section, I cover what you need to know to get the most out of the compilation stage if you are using the JDK compiler (javac). 3.9.1 Optimizations You Get for FreeSeveral optimizations occur at the compilation stage without your needing to specify any compilation options. These optimizations are not necessarily required because of specifications laid down in Java. Instead, they have become standard compiler optimizations. The JDK compiler always applies them, and consequently almost every other compiler applies them as well. You should always determine exactly what your specific compiler optimizes as standard, from the documentation provided or by decompiling example code. 3.9.1.1 Literal constants are foldedThis optimization is a concrete implementation of the ideas discussed in Section 3.8.2.5 earlier. In this implementation, multiple literal constants[9] in an expression are "folded" by the compiler. For example, in the following statement:
int foo = 9*10; the 9*10 is evaluated to 90 before compilation is completed. The result is as if the line read: int foo = 90; This optimization allows you to make your code more readable without having to worry about avoiding runtime overhead. 3.9.1.2 String concatenation is sometimes foldedWith the Java 2 compiler, string concatenations to literal constants are folded. The line: String foo = "hi Joe " + (9*10); is compiled as if it read: String foo = "hi Joe 90"; This optimization is not applied with JDK compilers prior to JDK 1.2. Some non-Sun compilers apply this optimization and some don't. The optimization applies where the statement can be resolved into literal constants concatenated with a literal string using the + concatenation operator. This optimization also applies to concatenation of two strings. In this last case, all compilers fold two (or more) strings since that action is required by the Java specification. 3.9.1.3 Constant fields are inlinedPrimitive constant fields (those primitive data type fields defined with the final modifier) are inlined within a class and across classes, regardless of whether the classes are compiled in the same pass. For example, if class A has a public static final field, and class B has a reference to this field, the value from class A is inserted directly into class B, rather than a reference to the field in class A. Strictly speaking, this is not an optimization, as the Java specification requires constant fields to be inlined. Nevertheless, you can take advantage of it. For instance, if class A is defined as: public class A { public static final int VALUE = 33; } and class B is defined as: public class B { static int VALUE2 = A.VALUE; } When class B is compiled, whether or not in a compilation pass of its own, it actually ends up as if it were defined as: public class B { static int VALUE2 = 33; } with no reference left to class A. 3.9.1.4 Dead code branches are eliminatedAnother type of optimization automatically applied at the compilation stage is to cut code that can never be reached because of a test in an if statement that can be completely resolved at compile time. The discussion in the earlier section Section 3.8.2.3 is relevant to this section. As an example, suppose classes A and B are defined (in separate files) as: public class A { public static final boolean DEBUG = false; } public class B { static int foo( ) { if (A.DEBUG) System.out.println("In B.foo( )"); return 55; } } Then when class B is compiled, whether or not on a compilation pass of its own, it actually ends up as if it were defined as: public class B { static int foo( ) { return 55; } } No reference is left to class A, and no if statement is left. The consequence of this feature is to allow conditional compilation. Other classes can set a DEBUG constant in their own class the same way, or they can use a shared constant value (as class B used A.DEBUG in the earlier definition).
You should use this pattern for debug and trace statements and assertion preconditions, postconditions, and invariants. There is more detail on this technique in Section 6.1.4 in Chapter 6. 3.9.2 Optimizations Performed When Using the -O OptionThe only standard compile-time option that can improve performance with the JDK compiler is the -O option. Note that -O (for Optimize) is a common option for compilers, and further optimizing options for other compilers often take the form -O1, -O2, etc. Check your compiler's documentation to find out what other options are available and what they do. Some compilers allow you to make the tradeoff between optimizing the compiled code for speed or minimizing the size. The standard -O option does not currently apply a variety of optimizations in the Sun JDK (up to JDK 1.4). In future versions it may do more, though the trend has actually been for it to do less. Currently, the option makes the compiler eliminate optional tables in the class files, such as line number and local variable tables. This gives only a small performance improvement by making class files smaller and therefore faster to load. You should definitely use this option if your class files are sent across a network. The main performance improvement of using the -O option used to come from the compiler inlining methods. When using the -O option with javac prior to SDK 1.3, the compiler considered inlining methods defined with any of the following modifiers: private, static, or final. Some methods, such as those defined as synchronized, are never inlined. If a method can be inlined, the compiler decides whether or not to inline it depending on its own unpublished considerations. These considerations seem mainly to be the simplicity of the method: in JDK 1.2 the compiler inlined only fairly simple methods. For example, one-line methods with no side effects, such as accessing or updating a variable, are invariably inlined. Methods that return just a constant are also inlined. Multiline methods are inlined if the compiler determines they are simple enough (e.g., a System.out.println("blah") followed by a return statement would get inlined). From 1.3, the -O option does not even inline methods. Instead, inlining is left to the HotSpot compiler, which can speculatively inline and is far more aggressive. The sidebar Why There Are Limits on Static Inlining discusses one of the reasons why optimizations such as inlining have been pushed back to the HotSpot compiler. Choosing simple methods to inline does have a rationale behind it. The larger the method being inlined, the more the code gets bloated with copies of the same code inserted in many places. This has runtime costs in extra code being loaded and extra space taken by the runtime system. A JIT VM would also have the extra cost of compiling more code. At some point, there is a decrease in performance from inlining too much code. In addition, some methods have side effects that can make them quite difficult to inline correctly. All this also applies to runtime JIT compilation. The static compiler applies its methodology for selecting methods to inline, irrespective of whether the target method is in a bottleneck: this is a machine-gun strategy of many little optimizations in the hope that some inline calls may improve the bottlenecks. A performance tuner applying inlining works the other way around, first finding the bottlenecks, then selectively inlining methods inside bottlenecks. This latter strategy can result in good speedups, especially in loop bottlenecks. This is because a loop can be speeded up significantly by removing the overhead of a repeated method call. If the method to be inlined is complex, you can often factor out parts of the method so that those parts can be executed outside the loop, gaining even more speedup. HotSpot applies the latter rationale to inlining code only in bottlenecks. I have not found any public document that specifies the actual decision-making process that determines whether or not a method is inlined, whether by static compilation or by the HotSpot compiler. The only reference given is to Section 13.4.21 of the Java language specification that specifies only that binary compatibility with preexisting binaries must be maintained. It does specify that the package must be guaranteed to be kept together for the compiler to allow inlining across classes. The specification also states that the final keyword does not imply that a method can be inlined since the runtime system may have a differently implemented method. The HotSpot documentation does state that simple methods are inlined, but again no real details are provided. Prior to JDK 1.2, the -O option used with the Sun compiler did inline methods across classes, even if they were not compiled in the same compilation pass. This behavior led to bugs.[10] From JDK 1.2, the -O option no longer inlines methods across classes, even if they are compiled in the same compilation pass.
Unfortunately, there is no way to specify directly which methods should be inlined rather than relying on some compiler's internal workings. Possibly in the future, some compiler vendors will provide a mechanism that supports specifying which methods to inline, along with other preprocessor options. In the meantime, you can implement a preprocessor (or use an existing one) if you require tighter control. Opportunities for inlining often occur inside bottlenecks (especially in loops), as discussed previously. Selective inlining by hand can give an order-of-magnitude speedup for some bottlenecks, and no speedup at all in others. Relying on HotSpot to detect these kinds of situations is an option. The speedup obtained purely from inlining is usually only a small percentage: 5% is fairly common. Some static optimizing compilers are very aggressive about inlining code. They apply techniques such as analyzing the entire program to alter and eliminate method calls in order to identify methods that can be coerced into being statically bound. Then these identified methods are inlined as much as possible according to the compiler's analysis. This technique has been shown to give a 50% speedup to some applications. 3.9.3 Performance Effects From Runtime OptionsSome runtime options can help your application to run faster. These include:
Some options are detrimental to the application performance. These include:
Some options can be both detrimental to performance and help make a faster application, depending on how they are used. These include:
Increasing the maximum heap size beyond the default usually improves performance for applications that can use the extra space. However, there is a tradeoff in higher space-management costs to the VM (object table access, garbage collections, etc.), and at some point there is no longer any benefit in increasing the maximum heap size. Increasing the heap size actually causes garbage collection to take longer since it needs to examine more objects and a larger space. Up to now, I have found no better method than trial and error to determine optimal maximum heap sizes for any particular application. This is covered in more detail earlier in this chapter. Beware of accidentally using VM options detrimental to performance. I once had a customer who had a sudden 40% decrease in performance during tests. Their performance harness had a configuration file that set up how the VM could be run, and this was accidentally set to include the -prof option on the standard tests as well as for the profiling tests. That was the cause of the sudden performance decrease, but it was not discovered until time had been wasted checking software versions, system configurations, and other things. |