Archive for the ‘Coding’ Category

Sort()ing out a Nerdsnipe

Saturday, October 1st, 2016

First, a disclaimer: this is really stupid stuff, and basically the opposite of work worth doing. But still, I got nerdsniped and the resulting rabbit hole went somewhere interesting, so I figured I’d write it up.

Update 10/2: Someone else got bit by the same bug. @swisshttp took a deeper dive, into the implementation details of different scripting engines. https://swisshttp.blogspot.ch/2016/10/latacora-riddle.html


The new security startup Latacora uses a small piece of javascript to randomize the names of the founders in the first paragraph. It seems like a reasonable & egalitarian thing to do, but after idly hitting F5 a few times (like you do…) I noticed that it’s not really random.

latacora_home

So, I went looking for the code that does the randomizing. It’s at the bottom and looks like:

original_javascript

Now, these are the same folks who put together the Matasano Crypto challenges and Starfighters.io, so something like that would not at all be out of the realm of possibility. And, whether intentional or not (hey, I said this whole rabbit hole was dumb), if test it (Gist)we find that the results are anything but random.

test-code

test-results

So, what gives? Is it in the call(s) to Math.random as the comment teases, or elsewhere? The code is really simple looking. Now, we hold as an article of faith that you cannot trust Javascript’s Math.random() (thanks @ropnop for the pointer), but there doesn’t seem to be any funny business with the seed or anything that might suggest actually messing with the generator. Plus, we’re not looking for a small bias, we’re looking for something really significant.

Could the issue be in the anonymous function passed to sort? Nope, not really. It correctly fulfills the contract expected by a custom sort function (return negative, 0, or positive number based on intended ordering of the compared values) , and the body of the function does what it says (implies) on the tin: returns each of those values close-enough-to-randomly, regardless of input.

Could it be something weird in the use of “ceil” to integerize the random number? That could create a small bias given the domain of Math.random, but again we’re looking for something that causes a huge bias.

Spoilers below – if you want to reason through this yourself, go ahead and do it, then come back.

No, the actual cause is both more subtle and more interesting. It’s actually in interaction of the implementation of Array.sort and the anonymous compare function. Several things are going on here:

  • the sort implementation (either quicksort or insertion sort in Chrome – I’m assuming insertion sort for an array this small and based on the observed behavior below ) assumes that the custom sort function passed into it does a sane comparison
  • the sort implementation moves left to right through the array
  • each comparison between two elements has 3 possible values (-1, 0, 1)
  • the sort operation is unstable (edit: this operation is stable. I misinterpreted the docs and how stability interacts with the random comparator. Thanks @fzzfzzfzzz)

So, there is no “correct” ordering of the array: the sorting algorithm isn’t checking to ensure that compares have a transitive property (nor should it). The result is not so much a true “sort” as a series of probabilistic changes of position.

I refactored the code a little bit to do a poor-man’s implementation of instrumenting the sort function (Gist). Here are three runs of it. Notice that that the only way Erin *doesn’t* get bumped is if she “loses” the first compare. A “win” (compare result of -1) does not result in a swap because of the intent of the sort function, and a “tie” (compare result of 0) is never rechecked because the sort implementation does not guarantee a stable result.

instrumented_sort

In effect: Someone starting in a given position has (1/3)^n chance of being bumped “n” positions in the array.

So, that explains why Erin gets to be first more often than expected, and why Thomas gets to be second. Here’s the breakdown of that math:

test-results

(1/3)^1 = .33333 ~= % of time Erin is bumped to position 1
(1/3)^2 = .11111 ~= % of time Erin is bumped to position 2

As expected, the pattern holds up if we add a “Dummy” to the end of the list:

test-results-with4

(1/3)^3 = .037 ~= % of time Erin is bumped to position 3

 


So… there you have it. It’s a (probably unintended) nerdsnipe that nonetheless took us into some interesting territory. Hope you had fun. Did I miss something or explain it wrong? Lemme know in the comments or on twitter (@coffeetocode).

BlackHat USA Multipath TCP Tool Release & Audience Challenge

Wednesday, August 6th, 2014

(Crossposting & backdating some content from the Neohapsis blog, which will soon be defunct)

We hope everyone found something interesting in our talk today on Multipath TCP. We’ve posted the tools and documents mentioned in the talk at:

https://github.com/Neohapsis/mptcp-abuse

Update (Aug 12, 2014): We’ve now also added the slides from the talk.


At the end we invited participants to explore MPTCP in a little more depth via a PCAP challenge.

Without further ado, here’s the PCAP: neohapsis_mptcp_challenge.pcapng

It’s a simple scenario: one MPTCP-capable machine sending data to another. The challenge is “simply” to reassemble and recover the original data. The data itself is not complex so you should be able to tell if you’re on the right track, but getting it exactly right will require some understanding of how MPTCP works.

If you think you have it, tweet us and follow us (@secvalve and @coffeetocode) and we’ll PM you to check your solution. You can also ask for questions/clarifications on twitter; use #BHMPTCP so others can follow along. Winner snags a $100 Amazon gift card!

Hints #0:

  • The latest version of Wireshark supports decoding mptcp options (see “tcp.options.mptcp”).
  • The scapy version in the git repo is based on Nicolas Maitre’s and supports decoding mptcp options. It will help although you don’t strictly need it.
  • The is an mptcp option field to tell the receiver how a tcp packet fits into the overall logical mptcp data flow (what it is and how it works is an exercise for the user🙂 )
  • It’s possible to get close with techniques that don’t fully understand MPTCP (you’ll know you’re close). However the full solution should match exactly (we’ll use md5sum)

Depending on how people do and questions we get, we’ll update here with a few more hints tonight or tomorrow. Once we’ve got a winner, we’ll post the solution and code examples.

Update: Winners and Solution

We have some winners! Late last night @cozinuzo contacted us with a correct answer, and early this morning @darkfiberiru got it too.

The challenge was created using our fragmenter PoC tool, pushing to a netcat opened socket on an MPTCP-aware destination host:

python mptcp_fragmenter.py -n 9 --file=MPTCP.jpg --first_src_port 46548 -p 3000 192.168.1.33

The key to this exercise was to look at the mechanism that MPTCP uses to tell how a particular packet fits into the overall data flow. You can see that field in Wireshark as tcp.options.mptcp.dataseqno, or in mptcp-capable scapy as packet[TCPOption_MP].mptcp.dsn.

mptcp_wireshark_column

 

 

The mptcp-capable scapy in our mptcp-abuse git repo can easily do the reassembly across all the streams using this field.

Here’s the code (or as a Gist):

# Uses Nicolas Maitre's MPTCP-capable scapy impl, so that should be
# on the python path, or run this from a directory containing that "scapy" dir
from scapy.all import *

packets = rdpcap("pcaps/neohapsis_mptcp_challenge.pcap")
payload_packets = [p for p in packets if TCP in p
                   and p[IP].src in ("192.168.1.26", "192.168.1.33")
                   and TCPOption_MP in p
                   and p[TCPOption_MP].mptcp.subtype == 2
                   and Raw in p]

f = open("out.jpg", "w")

for p in sorted(payload_packets, key=lambda p: p[TCPOption_MP].mptcp.dsn):
 f.write(p.load)
f.close()

These reassemble to create this image:

 

mptcp

 

 

The md5sum for the image is 4aacab314ee1a7dc5d73a030067ae0f0, so you’ll know you’ve correctly put the stream back together if your file matches that.

Thanks to everyone who took a crack at it, discussed, and asked questions!

What Kickstarter Did Right

Monday, February 17th, 2014

Only a few details have emerged about the recent breach at Kickstarter, but it appears that this one will be a case study in doing things right both before and after the breach.

What Kickstarter has done right:

  • Timely notification
  • Clear messaging
  • Limited sensitive data retention
  • Proper password handling

Timely notification

The hours and days after a breach is discovered are incredibly hectic, and there will be powerful voices both attempting to delay public announcement and attempting to rush it. When users’ information may be at risk beyond the immediate breach, organizations should strive to make an announcement as soon as it will do more good than harm. An initial public announcement doesn’t have to have all the answers, it just needs to be able to give users an idea of how they are affected, and what they can do about it. While it may be tempting to wait for full details, an organization that shows transparency in the early stages of a developing story is going to have more credibility as it goes on.

Clear messaging

Kickstarter explained in clear terms what was and was not affected, and gave straightforward actions for users to follow as a result. The logging and access control groundwork for making these strong, clear statements at the time of a breach needs to be laid far in advance and thoroughly tested. Live penetration testing exercises with detailed post mortems can help companies decide if their systems will be able to capture this critical data.

Limited sensitive data retention

One of the first questions in any breach is “what did they get?”, and data handling policies in place before a breach are going to have a huge impact on the answer. Thinking far in advance about how we would like to be able to answer that question can be a driver for getting those policies in place. Kickstarter reported that they do not store full credit card numbers, a choice that is certainly saving them some headaches right now. Not all businesses have quite that luxury, but thinking in general about how to reduce the retention of sensitive data that’s not actively used can reduce costs in protecting it and chances of exposure over the long term.

Proper password handling (mostly)

Kickstarter appears to have done a pretty good job in handling user passwords, though not perfect. Password reuse across different websites continues to be one of the most significant threats to users, and a breach like this can often lead to ripple effects against users if attackers are able to obtain account passwords.

In order to protect against this, user passwords should always be stored in a hashed form, a representation that allows a server to verify that a correct password has been provided without ever actually storing the plaintext password. Kickstarter reported that their “passwords were uniquely salted and digested with SHA-1 multiple times. More recent passwords are hashed with bcrypt.” When reading breach reports, the level of detail shared by the organization is often telling and these details show that Kickstarter did their homework beforehand.

A strong password hashing scheme must protect against the two main approaches that attackers can use: hash cracking, and rainbow tables. The details of these approaches have been well-covered elsewhere, so we can focus on what Kickstarter used to make their users’ hashes more resistant to these attacks.

To resist hash cracking, defenders want to massively increase the amount of work an attacker has to do to check each possible password. The problem with hash algorithms like SHA1 and MD5 is that they are too efficient; they were designed to be completed in as few CPU cycles as possible. We want the opposite from a password hash function, so that it is reasonable to check a few possible passwords in normal use but computationally ridiculous to try out large numbers of possible passwords during cracking. Kickstarter indicated that they used “multiple” iterations of the SHA1 hash, which multiplies the attacker effort required for each guess (so 5 iterations of hashing means 5 times more effort). Ideally we like to see a hashing attempt take at least 100 ms, which is a trivial delay during a legitimate login but makes large scale hash cracking essentially infeasible. Unfortunately, SHA1 is so efficient that it would take more than 100,000 iterations to raise the effort to that level. While Kickstarter probably didn’t get to that level (it’s safe to assume they would have said so if they did), their use of multiple iterations of SHA1 is an improvement over many practices we see.

To resist rainbow tables, it is important to use a long, random, unique salt for each password. Salting passwords removes the ability of attackers to simply look up hashes in a precomputed rainbow tables. Using a random, unique salt on each password also means that an attacker has to perform cracking on each password individually; even if two users have an identical password, it would be impossible to tell from the hashes. There’s no word yet on the length of the salt, but Kickstarter appears to have gotten the random and unique parts right.

Finally, Kickstarter’s move to bcrypt for more recent passwords is particularly encouraging. Bcrypt is a modern key derivation function specifically designed for storing password representations. It builds in the idea of strong unique salts and a scalable work factor, so that defenders can easily dial up the amount computation required to try out a hash as computers get faster. Bcrypt and similar functions such as PBKDF2 and the newer scrypt (which adds memory requirements) are purpose built make it easy to get password handling right; they should be the go-to approach for all new development, and a high-priority change for any codebases still using MD5 or SHA1.