Hashing and Password Security

Today, the most common way of providing security in giving access to data or systems is through the use of passwords. Practically every online site now expects you to create an account with a password, which will let you post comments, order products, conduct business, or just post to social media. The implication is that insisting on passwords provides some level of security. Now, following on our last tutorial we should ask a few questions about just how effective this measure is, since someone posting in your name to Twitter is significantly different from someone accessing your bank account. And since the assets being protected are very different, it would be reasonable to approach the problem of security somewhat differently in these cases. But given the ubiquity of passwords as the authentication for online accounts, we need to look at the security involved. Note that I am approaching this from the standpoint of the owner of the site in question for this tutorial, and will follow up with a look at your own role in this.

The asset you are trying to protect is access to your account on some Web site or IT system. Given that you are not the owner you don’t get to choose how this is done, the owner chooses, and does so for their own reasons. If they do a bad job there can be potential consequences, of course, but you will always have to agree to follow their rules if you use their system. Passwords are not the ideal way to protect this access, and I would not be at all surprised to see some major changes over the next 10 years or so as better ways are developed, including biometrics. But for now it is what we have.

So what are the threats? As we saw in the previous tutorial, one threat is you. If you can be tricked into revealing your password, then you have given away the access. And what are called “social engineering” attacks are frequently successful. If you are asked by someone you believe to be legitimate to tell them your password, will you? One approach that often works in large organizations is to call someone on the phone and say something like “I’m so-and-so from the IT department and I need to check your password.” Maybe they use words like verification as part of this. And instead of making a phone call, it could be an e-mail asking you to click a link, and it might look very official and have the right graphics from the company. If you can be tricked like this, they are in.

Generally these kinds of attacks are examples of a targeted attack against a specific company or individual. If the attacker has a good reason to want to get in it is worth putting in the effort, but you should recognize that this kind of attack does require a lot of effort to get a single password, and is only worth doing if the payoff is great. If you are trying to steal the intellectual property of a competitor, for instance, it is worth doing. But the vast majority of cases are not like this. Most attackers are looking for a financial reward, and that usually comes when you can get large numbers of passwords which can then be leveraged to get into bank accounts and the like.

One of the big weaknesses of passwords, as we saw last time, is that really secure passwords are hard to remember. And since every site you go to now requires an account to do pretty much anything, we all end up with hundreds of passwords. And people being what they are, the odds are very good that most people are using the same password on a whole bunch of sites. If that password is used to post a comment on someone’s blog, you probably don’t feel like it is a security risk. And in that context it isn’t, but if you also used that same password for your bank account? Anyone who could hack the blog site and get your password would then try to use it at the bank, and Bingo! they now have access to all of your money.

Of course, most hackers are not targeting blog sites because there are not the large numbers of passwords to be harvested there. They go after the large sites that have millions of accounts and passwords. And when they get them, they can then try to use them at other sites. And if the other sites have a standard way of creating account names, or if you always use the same account name each time, combining an account name and password is a piece of cake. And with computers to automate the process you can try large numbers of them every second. Even if only 10% of them work, that can be a huge payoff for a cyber-criminal. So this is what you need to worry about.

Now, we will get into how you can take steps to protect yourself in the next tutorial, but for this tutorial I want to look at how the site owner should protect your password. The worst case scenario is that they simply store your password “in-the-clear” in a database, i.e., they don’t encrypt it in any way. In that case, anyone who can get into the system has all of the passwords, game over. The owner may try to interpose a password to access the database, but this is not sufficient security. A good clue that the owner is doing something like this is that they limit the length of the password. You see, they may be using a database that sets field length on what they store, and you cannot exceed that field length. If you ever encounter a system that says your password cannot exceed 8, or 12, or whatever number of characters, be extremely suspicious. And for anything really important test it. They may let you enter a 20-character password, but try logging in using only the first 19 characters. If that works, you have pretty compelling evidence that they not only store passwords in the clear, but that they throw away any characters over some limit. You may not care if it’s someone’s blog, but I would never do online banking that way. I would either change banks or just decide to never use the online function. And if you are not using the online function, you should see if there is any way to instruct the bank to ignore any attempt at online activity. In the U.S. two recent court cases are pointing towards an interesting standard here. They involved hackers who were able to fraudulently access bank accounts an send wire transfers to accounts outside the U.S. In one case, the bank had urged the customer to adopt better security practices (essentially requiring multiple authorizations for any transfers) and the customer declined in writing to do so. The fraud occurred, the customer sued the bank for negligence, and the court sided with the bank since it had urged better practices on the customer. In a very similar case the customer prevailed in the lawsuit because there was no evidence that the bank had any good security around the process. The moral of the story is that you need to take advantage of any good security practices offered, and that you should turn off anything you don’t need to use. I personally would never allow any wire transfers to be initiated through online banking, I would require the bank to only accept such transfers if I show up in person with good identification. Or maybe even turn down the ability to do wire transfers if you don’t need it.

Back to passwords. At a minimum, passwords should be stored in an encrypted form to make it difficult for an attacker to get them even if they get access to the database where they are stored. Note that I said difficult, not impossible. A sufficiently motivated attacker has a number of tools available to get even encrypted passwords, but any time you can put a speedbump in their path, that is good. And that brings us to the concept of a hash. Hashing uses cryptography to generate an encrypted form of the password, and employs as one-way function to do so. A one-way function is something we looked at before with Public Key cryptography, and means a function that is easy to compute in one direction, but extremely difficult to compute in reverse. A good hash function would have the following properties (see Wikipedia article):

  • it is easy to compute the hash value for any given message
  • it is infeasible to generate a message that has a given hash
  • it is infeasible to modify a message without changing the hash
  • it is infeasible to find two different messages with the same hash.

Like all other forms of cryptography, advances in computer technology can make older forms vulnerable to brute force attack. As you can see from the Wikipedia article quoted above, the term we use is infeasible, not impossible. For instance, the MD5 algorithm was once used for this purpose. It was developed in 1991 by Ron Rivest (of RSA fame), but a flaw was found in 1996, and it is no longer used for security. It is still used for verifying file integrity though, since it meets condition three above: if even on bit gets flipped in a file, the MD5 hash will be completely different. So if you ever download ISOs on the Internet, chances are they will come with an MD5 hash that can be sued to validate that the file has not been corrupted, and for that purpose it is perfectly good. But for security purposes it should be avoided. (I recently read an alarmist article on how every password was crackable, and noted that the passwords were hashed with MD5. Needless to say, I was skeptical of the conclusions.) The next replacement was SHA1, or Secure Hash Algorithm 1. This was designed by the NSA, and was required in many government application. In 2005 weaknesses were found however, which led to SHA2. And recently a competition lead to a new replacement which will be known as SHA3. SHA2 has never been found to have a weakness, but because it shares some features with SHA1 the government decided it wanted an alternative that used a very different approach.

So for password security you would want to use either SHA2 or SHA3. Since SHA3 is recent, it is not in much use as yet, so SHA2 is what you would hope to encounter. Note that SHA1 is still in use, but Microsoft, for instance, announced that it stop accepting SSL certificates encrypted with SHA1 by 2017, so its days are numbered, as indeed they should be.

So, looking at what a hash should do, what does it all mean? First, creating a hash is supposed to be easy. That is what one-way functions are all about. This is similar to what we saw with PGP (a different technology) where you could generate your key pair on a home computer. Generating a hash should be similarly easy, and require very little computing power. And entropy is not a factor here, so you can generate thousands of hashes without any problem.

The second criterion says that you cannot reverse the process easily. If you have a given hash, there is no way to generate the message that produced it, at least with the current technology. That is the other half of one-way functions. Now technically this is not impossible, in the strict sense. Given enough computing power it is theoretically possible to try every conceivable message, compute the resulting hash, and compare it to the hash in question. This is the principle used in dictionary attacks which we will discuss below. But this also means that hashing is not a good way to encrypt a message to someone, since there would be no way for them to decrypt the message!

The third criterion says that any change at all to the message, even a single bit changed, would cause the hash to be completely different. And we do mean completely, the resulting hash would look totally different from the original. This makes it excellent for ensuring that the contents have not been altered, which is why hashes are used to validate downloads. Note that this is essentially the same function that digitally signing your e-mail performs. It assures that the message you sent has not been altered in any way.

The last criterion says that it should be highly unlikely that two different messages would have the same hash. This is a called a collision in cryptography, and would allow an attack. Again it is not impossible, just highly unlikely.

Also, one characteristic of hashing functions that is not on the list, but is worth knowing, is that they generate hashes of the exact same length regardless of the original message length. This turns out to be useful in understanding password hashes.

Password Hashes in Use

In most cases, you enter a password in a login page for some kind of online site, and your password is transmitted in the clear to the server. That means you could be vulnerable to what is called a man-in-the-middle attack, which is to say that if an attacker can get in between you and the online site, they can see your password. For that reason, it is important to make sure that you have a secure connection, generally one that uses SSL to establish an encrypted connection to the server. This is a whole topic in itself, so I won’t go into it any further here. In any case, your password goes to the server, and the server employs a hashing function (hopefully, SHA2 or better), and stores the hash in its database. Since all hashes are the same length, the Database Administrators can set aside a fixed field length to store the resulting hash, which makes DBAs happy.

When you later try to log in and type in your password, the server repeats the hashing function on the password, and compares the hash it gets with what it stored in the database, and if they match you are allowed in. And given the way that hashes work, any difference at all results in a totally different hash, so there can never be a concept of “close enough”. You and I might accept something that is 95% the same, but for hashed passwords that is not acceptable at all.

Hashes stored in this way are not susceptible to a frontal brute force attack. There is no way you can take the hash and do a computation that gives you the original password. But because the hashing function is generally well-known and deterministic there is an alternative attack that often does succeed. You can compute a so-called dictionary that contains the hashes for all known dictionary words and all known popular passwords, and then do a lookup of any given hash against this table. An attacker can get a lot of passwords this way because so many people exercise poor judgement. If you use “password” as your password, or “1234”, or “letmein” (all of these are known to be frequently used) they will be found in this kind of attack. And trying to use “leet-speak” to disguise your words (that is where you use a number in place of a letter, such as a “3” in place of an “e”, or a “1” in place of an “l”, you will get caught. Attackers know all about that and their dictionaries have all of those entries as well. So, in essence, if an attacker can get a database of a million passwords, they can run all of the hashes against a dictionary, and in short order can get as many as 50% of the passwords decrypted. And one thing that helps them is that a lot of people use the same password, and all of their hashes will be identical. You can see this in the periodically issued lists of the most common passwords. There is a counter-measure, however, called a salted hash.

Salted Hash

The idea here is to add a random element to each password so that the hash is harder to look up, and if any two people use the same password the hashes will be different because their “salt” is different. Of course, the random number has to be recorded so that you can login each time, and that can mean an added field, or perhaps table, in the database. Now, if an attacker gets the database they get the “salt” as well, but the computation gets exponentially more difficult. Suppose you have a password hash “x”, and a known “salt” of “y” that was used to calculate it. The only way you can recover the password is to create a new dictionary that combines every possible password in your original dictionary with the known random number and computes the resulting hash. And if you succeed in this, all you have is the one password. You would need to repeat this process for every password you have, which is what makes this computationally infeasible for most attackers. A good description of hashing with a discussion of how to do it right can be found at CodeProject.com. This is an excellent and detailed discussion which I recommend if you are interested. They also bring up another counter-measure that is worth combining with the salting, and that is a technique known as key stretching. Essentially, this means using a hashing algorithm that is notably slow to execute. For hashing a single password at the server level when a customer comes calling, the added time is not significant, but when you are trying to compute an entire dictionary of hashes, slowing down the attacker can make for a big difference.

So now we have seen how the site owner can make things more secure. Next we need to look at your own responsibility.

Listen to the audio version of this post on Hacker Public Radio!