[ Team LiB ] |
2.6 Decompiling ExplainedWhen you compile source code into an assembly, the compiler interprets your C# or Visual Basic .NET statements, and creates a series of MSIL statements that will be executed by the .NET Framework. A decompiler is an application that analyses the MSIL statements in order to recreate the original Visual Basic .NET or C# statements written by the programmer. Unfortunately, the .NET compilers contain human-readable information from our source code in the MSIL, including the names of types, methods, and fields. A decompiler can use this information to create source code that is very similar to the original. Some information, such as comments and blocks of code excluded by conditional compiler statements, are not included in assemblies by the compiler and cannot be restored by a decompiler. The nature of MSIL makes it easier to decompile .NET assemblies than native Windows applications, which are compiled into instructions that are targeted at a specific CPU, such as an Intel Pentium. Lower-level instructions are more difficult to reconstruct into code statements than the relatively abstract MSIL statements. The proliferation and use of decompilers is more widespread than you might think. There are three main reasons why an assembly is decompiled:
The scope for intellectual property theft through decompilation has been lessened by the increased use of thinner clients to connect to network services (this includes the move towards XML web services); there is less complexity to the client application, and the sophisticated logic is deployed within a remote network. In contrast, the prevalence of network services increases the scope for application subversion. Any network service that grants trust to clients based on data that is included in an assembly is subject to subversion through decompilation. Analysis of a client application can provide a wealth of information on network protocols and security configuration, which can be used to manipulate the network components of an application against the wishes and expectations of the developer.
2.6.1 Decompiling AssembliesIn this section, we demonstrate how much detail a decompiler exposes from an assembly. You will use the open source Anakrino/Exemplar decompiler to decompile the single-file assembly you created in the previous section; at the time of writing, the decompiler is available at http://www.saurik.com/net/exemplar/. The decompiled versions of the SumNumbers and SumArray classes are below—the decompiler we have selected generates only C# source code. We do not explain how to install or use the decompiler in this book—we present the decompiled output so that you can understand what kind of information can be obtained from an assembly: # C# using System; public class SumNumbers { private int o_total; public SumNumbers( ) { o_total = 0; } public void AddNumber(int p_number) { o_total += p_number; } public int GetTotal( ) { return o_total; } } The efficacy of a decompiler is measured by the accuracy of the source code that it generates—the better a decompiler is, the more the decompiled source code resembles the original statements. Our decompilation has produced a rendition of the SumNumbers class that is very close to the original; the names of the fields are preserved, and the structure and function of the class is clear: # C# public class SumArray { public static int SumArrayOfIntegers(int[] p_arr) { SumNumbers sumNumbers = new SumNumbers( ); int[] nums = p_arr; for (int k = 0; k < (int)nums.Length; k++) { int j = nums[k]; sumNumbers.AddNumber(j); } return sumNumbers.GetTotal( ); } } The decompiled version of the SumArray class is less like the original but still clearly demonstrates the implementation. Our simple assembly is easily decompiled, and the workings of our data types are clearly exposed; logic that is more complex can cause difficulties for decompilers, but in general, an unprotected assembly will yield its secrets easily. 2.6.2 Protecting Against DecompilationIf your assemblies contain no proprietary data, and no information that can be used to subvert your application, then you are in a position to distribute the assemblies freely; otherwise, you should consider protecting against decompilation with one of the techniques discussed below.
2.6.2.1 ObfuscationObfuscation is the technique of altering the MSIL statements so that the application executes in the same way, but the output of a decompiler is unreadable. Obfuscation is such an important technique that Microsoft has included a copy of a limited functionality obfuscator in Visual Studio .NET 2003. Different obfuscators use different approaches to obscure decompiler output, but we summarize the more common types of obfuscation below:
Effective obfuscators combine these approaches and often apply proprietary techniques. There is a kind of "arms race" between the developers of obfuscators and the developers of decompilers, where each new feature added by an obfuscator is eventually compromised by a decompiler. The biggest problem with obfuscation is that it alters the MSIL within your assembly; when problems arise, you will find that the obfuscation process can seriously hamper the debugging process. As a general guideline, do not obfuscate your assemblies unless you have to, and always select an obfuscator from a reputable company that will be able to support you if you encounter problems.
2.6.2.2 Native compilationAs you will see in Chapter 4, the .NET Framework runtime compiles your MSIL statements into native commands for the CPU before the code is executed. An alternative to obfuscation is to perform this compilation yourself and to create native instructions that cannot be processed by an MSIL decompiler. Native compilation is a relatively new technique as applied to .NET assemblies, and the tools available at the time of writing are immature; the principal risk with native compilation is that the output can differ from that produced by the normal .NET compilation process, which can hamper the debugging process. |
[ Team LiB ] |