Saturday, October 29, 2011

Protecting Code


As the world is shifting from compiled languages such as C, C++ and Pascal to scripting languages such Python, Perl, PHP and Javascript, so does the growth in exposure of intellectual property (the source code). While previously “fat clients” usually written in C and C++ were a compiled machine code executables, more modern applications written in .NET and Java consist of bytecode which is a “is the intermediate representation of Java programs” (Petter Haggar, 2001). The same is applicable to .NET applications which could be disassembled using tools shipped with the .NET Framework SDK (such as ILDASM) and decompiled back into source code (Gabriel Torok and Bill Leach, 2003). With web technologies such as HTML, Javascript and Cascading Style Sheets (CSS) where the source has to be downloaded to the client side in order to be executed by the web browser, the end user has unrestricted access to the entire source code.
Ability to access source code can be used both for legitimate and malicious intent. For example, security tools are using the ability to decompile Java applets and Flash to “performs static analysis to understand their behaviours” (Telecomworldwire, 2009). Moreover, the ability to disassemble the source code can be used by the software developers for debugging. On the other hand, it can also be used to reverse engineer the source code which directly impact the ability to protect the intellectual property.
One obvious way to try to protect the source code, thus the intellectual property it carries, is to use obfuscation (Gabriel Torok and Bill Leach, 2003)(Peter Haggar, 2001)(Tony Patton, 2008). Regardless of the language used to the develop the application, obfuscation usually means:
  • replacement of variable names to non-meaningful character streams
  • replacement of constants with expressions
  • replacement of decimal values with hexadecimal, octal and binary representation
  • addition of dummy functions and loops
  • removal of comments
  • concatenating all lines in the source code
In a way, the process of obfuscation changes the source code to make it difficult for the “reader” to understand the logic behind it. It (obfuscation) could be seen as “your kid sister encryption” - “cryptography that will stop your kid sister from reading your files” (Bruce Shneier, 1996). Of course, persistent “reader” can invest enough time and resources to reproduce the source code (deobfuscate) by applying obfuscation principals in reverse.

Bibliography

No comments:

Post a Comment