Saturday, October 29, 2011

Protecting Code

As the world is shifting from compiled languages such as C, C++ and Pascal to scripting languages such Python, Perl, PHP and Javascript, so does the growth in exposure of intellectual property (the source code). While previously “fat clients” usually written in C and C++ were a compiled machine code executables, more modern applications written in .NET and Java consist of bytecode which is a “is the intermediate representation of Java programs” (Petter Haggar, 2001). The same is applicable to .NET applications which could be disassembled using tools shipped with the .NET Framework SDK (such as ILDASM) and decompiled back into source code (Gabriel Torok and Bill Leach, 2003). With web technologies such as HTML, Javascript and Cascading Style Sheets (CSS) where the source has to be downloaded to the client side in order to be executed by the web browser, the end user has unrestricted access to the entire source code.
Ability to access source code can be used both for legitimate and malicious intent. For example, security tools are using the ability to decompile Java applets and Flash to “performs static analysis to understand their behaviours” (Telecomworldwire, 2009). Moreover, the ability to disassemble the source code can be used by the software developers for debugging. On the other hand, it can also be used to reverse engineer the source code which directly impact the ability to protect the intellectual property.
One obvious way to try to protect the source code, thus the intellectual property it carries, is to use obfuscation (Gabriel Torok and Bill Leach, 2003)(Peter Haggar, 2001)(Tony Patton, 2008). Regardless of the language used to the develop the application, obfuscation usually means:
  • replacement of variable names to non-meaningful character streams
  • replacement of constants with expressions
  • replacement of decimal values with hexadecimal, octal and binary representation
  • addition of dummy functions and loops
  • removal of comments
  • concatenating all lines in the source code
In a way, the process of obfuscation changes the source code to make it difficult for the “reader” to understand the logic behind it. It (obfuscation) could be seen as “your kid sister encryption” - “cryptography that will stop your kid sister from reading your files” (Bruce Shneier, 1996). Of course, persistent “reader” can invest enough time and resources to reproduce the source code (deobfuscate) by applying obfuscation principals in reverse.


Saturday, October 22, 2011

Adaptave Web Site Design

Paul De Bra (1999), identifies a number of issues related to adoptive web site design including “the separation of a conceptual representation of an application domain from the content of the actual Web-site, the separation of content from adaptation issues, the structure and granularity of user models, the role of a user and application context” Paul De Bra (1999). This essay will discuss separation of conceptual representation and the role of the user in the application context more than ten years after publication of the original article.
Modern web application development frameworks such as .NET, Spring Framework, JavaServer Faces, Apache Orchestra, Grails and Struts offer clear separation between application representation and the content. The separation is achieved by implementation of Model-View-Controller (MVC) architecture where “Model” layer is responsible for storing and managing access to relevant pieces of data, “View” layer is responsible for rendering and layout of the data, and “Controller” layer is responsible for interaction with the end user (i.e. Internet browser). No more the entire content has to be “stored” statically in the HTML page, but generated dynamically based on input received from the user. Moreover, HTML5 Web Storage API greatly increase the storage capacity (compared to HTML session cookies) which allows web application to store structured data on a client side (WHATWG, 2011). This could further facilitate user centric web site design such as storage of user preferences, data catch, etc.
On the other hand, when discussion “the role of a user and application context” Paul De Bra (1999), the methodology and the technology is not as mature. Qiuyuan Jimmy Li ties the issue to the organization of the web application structure and notes that majority of web sites do not adapt the content to the individual user. Instead, the web server “provides the same content that has been created beforehand to everyone who visits the site” (Qiuyuan Jimmy Li, 2007). Instead, he suggest a framework which accounts for users' cognitive style and adopts information content for each individual user. Justin Brickell at. al. (2006) takes a slightly different approach and instead suggest mining site access longs to identify access patterns and user behavior such as scrolling, time spent on each page, etc. The collected information could be used for shortcutting - “process of providing links to users’ eventual goals while skipping over the in-between pages” (Brickell at. al., 2006).
In addition, it is important to highlight the security and privacy issues when discussing adaptive web-site design. In order for a web application to provide customized content, it (web application) requires to acquire or collect personal data about individual user and users' behavior patterns. For example, Google Gmail uses automated scanning and filtering technology to “show relevant ads” (Google, 2011). This could be considered by some individuals as intrusion into privacy, especially if the processed message contains sensitive information such as health records or financial information.