Archive for April, 2011

Trivial Passwords Are Worse Than Useless: A Simple Case Study in Entropy

Thursday, April 7th, 2011

Apparently an email address I own is similar enough to an Indian surname that I get a fair amount of misdirected business correspondence. Despite protestations that they have the wrong address, one large financial institution however continues to send me account updates (including account numbers, balances and addresses). The documents are sent as password protected PDFs, which might be fine, except that they state in the text of the email that the password is the user’s date of birth in the format DDMMYYYY.

Complexity Fail

Those of you passingly familiar with the concept of entropy no doubt let out a groan there. For the rest, here’s why: using a date of birth reduces the complexity of the password into the realm of “trivially weak”. Entropy is a common measurement of information complexity; how “surprising” a piece of information is, or how “unknown” it is (…stick with me on this). Simply knowing that the password is a date reduces the unknown-ness of that password from a reasonably-secure level to an entirely unacceptable level.

For comparison, if we assume an 8-character password with the 94 standard keyboard symbols, we have an entropy of (8 log2(94) ) = 52.44 bits (or equivalently, just over 6 quadrillion possibilities), which is reasonable for most purposes.

On the other hand, a date isn’t just an 8 character password. It’s not even an 8 character numeric password (with obviously 99,999,999 options, or 26.8 bits of entropy), which would be weak but not laughable. In fact, it’s really a 3 character password: a month, a day, and a year. Those are respectively ~30.44 possibilities  (days per month), 12 possibilities, and 60 possibilities (assuming our account holder was born between 1940 and 2000). In bits, that’s approximate 4.93 + 3.58 + 5.91 = 14.42 bits. An analogous password described in characters we are familiar with would be a three character password made up of: a single number, followed by a single lower-case letter, followed by a single alphanumeric. So, your password options are no different (entropy-wise) than “1aA” or “8q3”, and you didn’t even get to pick your wussy three characters.

Solving 14 bits of Entropy

Let’s put this to work. First, a list of every date between Jan 1, 1940 and Jan 1, 2000. Python is my sketchpad of choice:

from datetime import datetime, timedelta
max_date = datetime(1999, 01, 01)
date = datetime(1940, 01, 01)
day = timedelta(1)
f = open("datelist.txt", "w")
while(date < max_date):
    date = date + day

Now datelist has a properly formatted date for each day in our range. How many possibilities is that?

$ head -n 2 datelist.txt
$ wc -l datelist.txt
21550 datelist.txt

That’s in line with our estimate above. Cool, let’s use that list to break a PDF created with this password scheme. Pdfcrack is a simple open-source password bruteforcing tool that helpfully takes a wordlist.

$ pdfcrack -f SensitiveDoc.pdf -w datelist.txt
PDF version 1.4
Security Handler: Standard
V: 2
R: 3
P: -1028
Length: 128
Encrypted Metadata: True
FileID: 9f86e55a12672dcd9b9a9cd3423303da
U: b89fd170770d5b802423d0ec2ae7ec6d00000000000000000000000000000000
O: 301981f88c00ebdafde32360d24b7ae0f6b8a3e1865ac314cbaec4f7cc7a3f49
found user-password: '13051959'

How long did that take?

$ /usr/bin/time -p pdfcrack -f SensitiveDoc.pdf -w datelist.txt cmd 2>&1  | grep user
found user-password: '13051959'
user 0.20

One fifth of a second. Super secure!

General Advice

So, to wrap up. Less complex passwords are reasonable in a security context where a system can monitor password guessing: web based systems, network logins, etc. Then you can respond with enforced guessing intervals, CAPTCHAs or secondary validation. However, when the attacker can take the data for offline cracking, the required strength of passwords goes way up. Using and trusting weak passwords in this instance caused this company to broadcast sensitive information that it wouldn’t intentionally expose.

The company would be much better off providing users a random 10 character code that they can write down and use to decrypt the account statements (yes, seriously, write down your passwords), or simply asking users to log in for the statement information.