Archive for the ‘Coding’ Category

Trivial Passwords Are Worse Than Useless: A Simple Case Study in Entropy

Thursday, April 7th, 2011

Apparently an email address I own is similar enough to an Indian surname that I get a fair amount of misdirected business correspondence. Despite protestations that they have the wrong address, one large financial institution however continues to send me account updates (including account numbers, balances and addresses). The documents are sent as password protected PDFs, which might be fine, except that they state in the text of the email that the password is the user’s date of birth in the format DDMMYYYY.

Complexity Fail

Those of you passingly familiar with the concept of entropy no doubt let out a groan there. For the rest, here’s why: using a date of birth reduces the complexity of the password into the realm of “trivially weak”. Entropy is a common measurement of information complexity; how “surprising” a piece of information is, or how “unknown” it is (…stick with me on this). Simply knowing that the password is a date reduces the unknown-ness of that password from a reasonably-secure level to an entirely unacceptable level.

For comparison, if we assume an 8-character password with the 94 standard keyboard symbols, we have an entropy of (8 log2(94) ) = 52.44 bits (or equivalently, just over 6 quadrillion possibilities), which is reasonable for most purposes.

On the other hand, a date isn’t just an 8 character password. It’s not even an 8 character numeric password (with obviously 99,999,999 options, or 26.8 bits of entropy), which would be weak but not laughable. In fact, it’s really a 3 character password: a month, a day, and a year. Those are respectively ~30.44 possibilities  (days per month), 12 possibilities, and 60 possibilities (assuming our account holder was born between 1940 and 2000). In bits, that’s approximate 4.93 + 3.58 + 5.91 = 14.42 bits. An analogous password described in characters we are familiar with would be a three character password made up of: a single number, followed by a single lower-case letter, followed by a single alphanumeric. So, your password options are no different (entropy-wise) than “1aA” or “8q3”, and you didn’t even get to pick your wussy three characters.

Solving 14 bits of Entropy

Let’s put this to work. First, a list of every date between Jan 1, 1940 and Jan 1, 2000. Python is my sketchpad of choice:

from datetime import datetime, timedelta
max_date = datetime(1999, 01, 01)
date = datetime(1940, 01, 01)
day = timedelta(1)
f = open("datelist.txt", "w")
while(date < max_date):
    date = date + day

Now datelist has a properly formatted date for each day in our range. How many possibilities is that?

$ head -n 2 datelist.txt
$ wc -l datelist.txt
21550 datelist.txt

That’s in line with our estimate above. Cool, let’s use that list to break a PDF created with this password scheme. Pdfcrack is a simple open-source password bruteforcing tool that helpfully takes a wordlist.

$ pdfcrack -f SensitiveDoc.pdf -w datelist.txt
PDF version 1.4
Security Handler: Standard
V: 2
R: 3
P: -1028
Length: 128
Encrypted Metadata: True
FileID: 9f86e55a12672dcd9b9a9cd3423303da
U: b89fd170770d5b802423d0ec2ae7ec6d00000000000000000000000000000000
O: 301981f88c00ebdafde32360d24b7ae0f6b8a3e1865ac314cbaec4f7cc7a3f49
found user-password: '13051959'

How long did that take?

$ /usr/bin/time -p pdfcrack -f SensitiveDoc.pdf -w datelist.txt cmd 2>&1  | grep user
found user-password: '13051959'
user 0.20

One fifth of a second. Super secure!

General Advice

So, to wrap up. Less complex passwords are reasonable in a security context where a system can monitor password guessing: web based systems, network logins, etc. Then you can respond with enforced guessing intervals, CAPTCHAs or secondary validation. However, when the attacker can take the data for offline cracking, the required strength of passwords goes way up. Using and trusting weak passwords in this instance caused this company to broadcast sensitive information that it wouldn’t intentionally expose.

The company would be much better off providing users a random 10 character code that they can write down and use to decrypt the account statements (yes, seriously, write down your passwords), or simply asking users to log in for the statement information.

Fuzzing Comes in from the Cold

Thursday, May 20th, 2010

So, after a couple months of living in webapp security land and having my developer hat on, I finally took a few days to do some good old fashioned vulnerability hunting. These days, that means fuzzing.

I’m going to go ahead and say that fuzzing is ready to come out of the cold, from being primarily thought of as something security researchers and blackhats do, to eventually being something as expected as unit tests (…though I’m probably about 2 years late in saying that). With fuzzing a part of the SDL (gj MS) and Charlie Miller publicly calling on companies to get with the program, it’s well on its way.

Now, unit testing (and code coverage) took a while to be considered expected practice (and depending on where you work, might still raise eyebrows), but by and large they’re generally considered something that helps improve quality and reduce risk in a project. I have hopes that fuzzing will get there too.

The tooling is there, except that I think coming from the security community has hindered it a bit. There’s no standout leader like xUnit (cpp, n, j, etc), and instead we have dozens of tools grown from individual developers, which range from utterly broken to pretty good, and most serious fuzzing undertakings end up having to piece together a solution out of a number of other partial solutions. If you’re Charlie Miller, you have it figured out and built into a fusion powered spaceship, but the rest of us are still getting there (seriously, if you read one thing on fuzzing, check out that presentation from CANSECWEST this year… we all can aspire).

Trying to piece together such a solution myself, I started with FileFuzz and the excellent text Fuzzing: Brute Force Vulnerability Detection by Sutton, Greene and Amini.

Get Started Quickly With FileFuzz

If you’re looking to start file-based fuzzing as quickly as possble, FileFuzz is a good bet. It’s a mutational fuzzer so all you need to get started is a single sample file, and it’s “batteries included” (unlike many solutions) in that it incorporates the three big moving pieces of fuzzing: sample creation, test running, and error detection. Crash triage automation is a task that it doesn’t try to address, but if you’re just trying to get started, it’s going to help immensely.

While using it though, I found some bugs. Fuzzing a series of binary files, this pops up:

The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'. Parameter name: chars.

A bit of googling suggests that PeekChar can’t reliably be used on binary data. I made the following change in the readBinary() function of Read.cs (line numbers are approximate because I made some other changes):

            //while (brSourceFile.PeekChar() != -1)
            while (brSourceFile.BaseStream.Position &lt; brSourceFile.BaseStream.Length)

If you’re running FileFuzz on a modern .NET runtime (or through VisualStudio) you may see problems such as:

InvalidOperationException: Cross-thread operation not valid: Control 'Foo' accessed from a thread other than the thread it was created on

It looks like the FileFuzz UI was written before these cross-thread checks were enforced in .NET, so if you don’t want to spend a lot of time writing threadsafe delegates, you can add one line to revert to the old (unchecked) behavior. Add this at the beginning of InitializeComponent() in Main.cs (again, line numbers are approximate):

            Control.CheckForIllegalCrossThreadCalls = false;

At one point I thought I found that FileFuzz was only generating different files for the first 10 bytes or so, and identical files after that. It may have been some config error on my part and I couldn’t duplicate it later, but you may want to give your files a quick run through md5sum, just to make sure you don’t waste a lot of CPU cycles. (Has anyone else see this?)

Structured Exception Handling

While running FileFuzz against a particular target, I found a number of hits that didn’t reproduce nicely when run alone. When the target binary was executed via crash.exe (included w/ FileFuzz), it would show a access violation:

[*] "crash.exe" "C:\Program Files\xxx" 1000 C:\fuzzing\xxx\output\136
[*] Access Violation
[*] Exception caught at 1001c06d mov eax,[esi+0x8]
[*] EAX:0011f050 EBX:00000030 ECX:00000000 EDX:00000092
[*] ESI:00000000 EDI:0011f54c ESP:0011f0ec EBP:0011f0f4

When run with the same file from the command line, nothing; just an error message and a clean exit. Initially puzzling, I found that this is a result of windows Structured Exception Handling. (Here’s an old but worthwhile read on what really goes on under the hood in SEH) So, hook it up under OllyDbg or IDA and boink, there it is.

When I get a chance I need to get set up with !exploitable (presentation here ), but I’ll have to share that in a later post.

Porting Narcissus to Rhino

Friday, February 26th, 2010

I just spent a couple days trying to build Spidermonkey on Windows with Narcissus, JS_THREADSAFE and JS_HAS_FILE_OBJECT support in order to parse and do some static analysis of  malicious JS.

After patching the the Makefiles (and cpp in one place) and trying way too hard to make it work, I’ve got to warn others off of going down this route. I repeat: DON’T try to make it work. First, you’ll find that there’s no easily available parsing API in spidermonkey itself and that building in extensions is incredibly tedious and error prone (despite the great autoconfig work done in 1.8.1 for the vanilla build, which works perfectly); second, the non-standard File object is no longer maintained and as of Feb 10, 2010 is completely removed.

So with that option DOA, if you had your heart set on using Narcissus for parsing, it can be trivially ported to Rhino. Despite Brendan’s comments in the Narc code and other locations on the web that it relies on several Spidermonkey-specific extensions, it runs perfectly fine (at least the parser does) with a few minor changes. Here is the code, a diff, and sample tree walker.