Writing secure code is normally only taught in advanced, highly specialized courses. This is very unfortunate as it is because of this single reason that phenomena like worms, viruses and crackers can cause the damage they do.

Solving this deficiency in knowledge and skills is both highly complex and surprisingly simple. The reason is well known - it is called the Pareto principle. Simply stated, it postulates that for 20% effort put in to something you will reap 80% of the benefits. In our scenario, this postulates that with 20% effort you will be able to reduce the risk of writing insecure software by 80%. This is in truth not far off...

Most gross insecurities are a direct result of easily solved software deficiencies, or better known as software bugs. This article covers many important aspects regarding both obvious security flaws, as well as subtle issues. It is not a thorough coverage of the topic as one can write entire books on the subject. The intention is to make the reader aware of the more common 20% security risks, and touch base on some of the other 80%.

I have written a piece of code below that is typically used in web portals. This code is vulnerable to a well known security issue. See if you can find it...

class Auth {
  public boolean authenticateUser(String pUserName, String pPassword) {
    String vSQL = "select user_id from user where username = '"+pUserName+
                  "' AND password = '"+pPassword+"'";
    int vUserID = performSQLQuery(vSQL);  // executes the actual SQL query

    if (vUserID > 0)
      return true;
    else
      return false;
  }
}

Could you spot the problem if this code were to be called by a system that acquires the user name and password values directly from the user? Consider what would happen if the user typed in Piet for the pUserName parameter and 12345 for the pPassword parameter. The SQL query would evaluate to:

select user_id from user where username = 'Piet' AND password = '12345'

There is nothing wrong with that SQL query. The problem comes in when the user types in say 1' OR 1==1; -- for the pUserName parameter and anything else for the pPassword parameter. See what happens:

select user_id from user where username = '1' OR 1==1; --' AND password = 'ANYTHING'

which is equal to

select user_id from user where username = '1' OR 1==1;

If you understand SQL well, you will realise that that SQL query is completely different to that which I originally intended! In this specific scenario, the user has changed the SQL query to retrieve all rows where username is equals to 1, OR where 1 == 1. Because the last condition will always evaluate to true, the complete condition will always evaluate to true. And because the two -- characters were inserted, the modified SQL statement will ignore all the rest of the characters as they are now commented out. This helps the user construct a valid SQL statement by means of parameter contamination, or better known as SQL injection.

Hopefully you are shocked by this if you have not seen it before. The bottom line is that that innocent code (which is used widely throughout the world in many real systems) allows the user - without any special permissions or privileges - to execute arbitrary SQL statements by just typing a specially crafted SQL fragment into one of the input fields, right from the user interface. In this scenario, the user will have been granted access to the system without even knowing the correct username, nevertheless the correct password! Consider one more example, just to scare you to death. Say the user typed in for pUserName the value 1'; delete from user; --. The SQL query would evaluate to:

select user_id from user where username = '1'; delete from user; --'
AND password = 'ANYTHING'

which is equal to

select user_id from user where username = '1';
delete from user;

In this situation the user was not interested in being granted access to the system, rather he was interested in destroying the data in the user table. The first SELECT query would fail. However, a second SQL query was specified with the ; operator, which will then be executed and deleting all the users in the user table. Usually web servers connect to back end databases with one RDBMS userid, which has full access to the corresponding tables. Hence this will be destructive in most scenario's.

Fortunately for us this is one of those 20% situations in which it is rather easy to provide a patch for this code. The root of the problem is unchecked user input, which can be remedied with code as follows.

class Auth {
  public boolean authenticateUser(String pUserName, String pPassword) {
    String vSQL = "select user_id from user where username = ? AND password = ?;
    int vUserID = performSQLQuery(bindParameters(vSQL,pUserName,pPassword));

    if (vUserID > 0)
      return true;
    else
      return false;
  }

 private String bindParameters(String pSQL, String pUserName,String pPassword) {
    // Replacing the ? with the parameter can be done in many other ways.
    // This is ad-hoc code and not tested!
    return vSQL.replaceFirst("\?",
       escape(pUserName)).replaceFirst("\?",escape(pPassword));
  }

 private String escape(String pString) {
    StringBuffer vRes = new StringBuffer();
    for (int i=0;i<pString.length;i++) {
      if (pString.charAt(i) == '\'')
        vRes.append("'");
      vRes.append(pString.charAt(i));
    }

    return "\"" + vRes.toString() + "\"";
  }
}

This is not the most elegant solution, and will break for many reasons (such as the pUserName containing the ? character etc.) The code does show a fix however, and that is to make sure you escape the ' character with another one. Depending on the RDBMS flavour you are running, it might be necessary to escape other characters as well. This code will not be exploitable using the standard ' SQL Injection technique.

If you are very clever you would notice that the aforementioned "fix" is not really a fix at all. By plugging one hole we have allowed another vulnerability to creep in. What would that "fix" do with the following user input.

select user_id from user where username = '1\'; delete from user; --'
AND password = 'ANYTHING'

This would translate to

select user_id from user where username = '1\'';
delete from user;

Can you see that the escaping does not work in the example? When the user used another means of escaping the quote character by specifying \', it effectively rendered the extra quote we inserted useless as it is seen as part of the parameter. Therefore our escaping caused this vulnerability (which did not previously exist) to become exploitable. How do we fix this? I think at this stage it should be obvious that we will have to apply heuristics in order to achive a set level of confidence. This is because of vendor extensions and dialects. By using these techniques one will never have full confidence that it is secure. The only reliable, generic way around this is to use another method of passing parameters to the database. One such method is to make use of PreparedStatement's, as the RDBMS server will perform the parameter binding to the ? placeholder.

If this section caused you to realize that one cannot have confidence in a security fix just because it fixes the specific security exploit - it was worth my while writing it. Always test your code for all known vulnerabilities, and also use common sense in assessing the strength of a security fix.

Most people have heard about Buffer Overflows. This is a condition where data overrun the space allocated for it in memory. Without going into too much technical details, in essence the following code will be vulnerable to a buffer overflow.

int main(int argc, char* argv[]) {
  char* vBuffer = new char[10];
  char vDelete;

  printf("Type Y to delete ALL YOUR EMAIL, N otherwise: ");
  vDelete = getchar();

  printf("Type in your name: ");
  vBuffer = gets(vBuffer);

  if (vDelete == 'Y') {
    deleteEverything();
    printf("Deleted %s's data\n",vBuffer);
  }
  else
    printf("NOT deleted %s's data\n",vBuffer);
}

What happens if the user types in a name that is equal or longer to 10 characters? In C/C++, a string is always null terminated. That means the total length of the string as read in by the user, will be that length + 1 (for the \0 character). Assuming the user typed in a long string, the following will happen in memory.

Buffer Overrun
Buffer Overrun

Assume that the variable vBuffer is stored at memory location A000, as indicated in the diagram. As you can see there are 10 bytes allocated to the variable. Assume further that the memory location A00A points to the variable vDelete, which is a single byte in length. Lets assume the user's name is Shaun Kelly and that he does not want to delete all his email. Therefore the program's output will look like this:

 >Type Y to delete ALL YOUR EMAIL, N otherwise: N
 >Type in your name: SHAUN KELLY
 >Deleted SHAUN KELLY's data

The obvious question is: Why did the system delete his data if he said N for No? The answer is simple - the user's name was too long for the buffer allocated for it, and overflowed into the adjacent memory cells, in this case they happened to belong to vDelete. Starting from the beginning, the memory area is filled with random data (indicated by ?). In the second diagram, the user has specified that he does not want to delete all his mail by typing in a N, and this is stored in memory location A00A. In the third diagram, he typed in his full name. However, only the first 10 characters of his name fits in to the buffer allocated to vBuffer. The 11th character Y overflowed into memory location A00A - the start of the vDelete variable's memory location. This Y then overwrote the N that was previously stored there. If the code now tries to determine whether to delete all the mail, it would retrieve an incorrect value of Y for the vDelete variable.

There is a simple way to solve this issue - and that is to increase the buffer's capacity. Unfortunately this will only delay the inevitable buffer overrun for the time when a really long name is typed in. Therefore the secure way to resolve this is to use a secure API method, as the whole issue is caused by gets() not checking the buffer's boundaries. The code can be written securely as follows.

int main(int argc, char* argv[]) {
  char* vBuffer = new char[15];
  char vDelete;

  printf("Type Y to delete ALL YOUR EMAIL, N otherwise: ");
  vDelete = getchar();

  printf("Type in your name: ");
  vBuffer = fgets(vBuffer,sizeof(vBuffer)-1,stdin);

  if (vDelete == 'Y') {
    deleteEverything();
    printf("Deleted %s's data\n",vBuffer);
  }
  else
    printf("NOT deleted %s's data\n",vBuffer);
}

It is important to note that the memory locations of the declared variables are never guaranteed to be consecutive, or for that matter at any specific location in memory. This is especially true for variables that have been allocated memory from the global heap with the new operator or malloc()/calloc() methods. Therefore the behaviour of buffer overflows are highly unpredictable.

A more dangerous exploit of buffer overflows is when the IP CPU register is manipulated to point to so-called shellcode, which will then get executed. The idea is simple. Once you found a buffer overflow condition (in Unix, if the application segmentation fault's it is a candidate for a buffer overflow condition), try to manipulate the IP register by changing entries in the stack so that it returns to your shellcode. Once this has been accomplished, code of your choice can be executed with the privilege of the user the process is running under. Shellcode is assembly code carefully crafted and attached to the payload intended for the vulnerable buffer. In our case, should we have wanted to exploit the buffer overflow using shellcode, we would have appended our shellcode to the end of the long string. This is a complex process and the offsets need to be perfectly aligned with the IP register's contents. This can once again be avoided by avoiding the buffer overflow condition in the first place. Malicious code cannot get executed this way if it never makes it into memory...

Framework development is something most developers avoid as it is complex and best left to experts. Without going into too much detail, I want to touch base on the issue of enabling security options in a framework.

(Security conscious) Designers always have to balance the equation between usability on the one hand and security on the other. A good example is what happened to me with the development of the WebMVC framework. It has a BinaryJSPView class that allows you to specify a file to send to the client browser. This method takes the name of the file and the file's contents as model attributes, and will then submit this data to the client.

Assume your web frontend code passes the name of a file to download as a GET/POST HTTP request, and the corresponding handler retrieves the file and forwards the request to the BinaryJSPView object. I realised immediately that this is a very vulnerable situation as the following scenario shows. Assume the file to be downloaded is located at /opt/jakarta-tomcat/webapps/myapp/downloads/file.zip. Assume the web frontend specifies the file file.zip to be downloaded. It is extremely easy for a web user to change this so that the file to be downloaded points to say ../../../../../etc/passwd. Appended to the original path this gives /opt/jakarta-tomcat/webapps/myapp/downloads/../../../../../etc/passwd, which is equivalent to /etc/passwd. Therefore the user has access to the contents of all files with a known file name, and that the user the web server is running under has access to. Scary!

This is easily resolved by changing BinaryJSPView to check the filename given for any signs of the ../ substring. However, doing this disallows the legitimate use of that feature. Therefore we have a classic tradeoff between security and usability. At first I was tempted at solving this by adding a parameter called METHOD_PARANOID_FILE_PATH_CHECK to the BinaryJSPView class, and if set, perform a rigorous and paranoid check on the filename. Therefore if there is a legitimate use of the ../ characters in the filename, this parameter should not be set. In all other situations this parameter should be set for security reasons. With experience comes the great knowledge that people do not read manuals, therefore a term such as RTFM had been coined :). In addition developers are lazy, therefore nobody would take the time to read up on that added parameter and how to use it. That is why I have decided to make that parameter's default value active, hence it can be overridden by turning it explicitly off. So the philosophy is that lazy developers will have the secure option enabled by default, and only when running into troubles because of too strict security policies hampering their application, they can go RTFM and relax the restrictions a bit. This is very much like most (competent) firewall administrators think by specifying a DENY ALL policy and then only specifying ALLOW rules on a case by case basis.

Secure programming is a huge topic and I have only touched on some aspects. There are also big differences in writing secure programs for different operating systems. I recommend that you include security tests in addition to the usual functionality tests once you have completed your application. Tools such as nessus are great aids in finding network exploitable insecurities.