Setting Up A Malware Analysis Sandnet

A malware analysis sandset is an environment created to allow relatively safe analysis of malicious software samples, which are generally obtained either via a honeynet, or perhaps by reading all the spam one gets and following all the links 😉
I’ve been running one for over a year, and figured it was about time to document how I have it set up.


Currently there are 2 hosts in the sandnet:

    • viktim: A Windows XP SP2 host which is deliberately infected for the purposes of analysis


    • snservices: A Debian Linux host which serves wildcard DNS, as well as running Apache, and providing various other services used to analyze the malware.



viktim is running XP SP2 (no patches other than the SP) and has 2 hard drives in it.
After installing Windows, I disabled the second hard drive so that the OS doesn’t see it. Then I booted up a live Linux distro and used dd to copy the OS drive to the backup one.
The advantage to this is that I can install whatever malware is desired and then simply dd from the backup drive over the infected one to get things back to a clean state.
Note: It takes about 15 minutes to do the dd (they’re 9gb drives).


snservices is running Debian Linux for an OS, and has been configured with a number of tools, including:

    • BIND


    • Apache


    • NMAP


    • Paketto


    • ettercap


    • SpiderMonkey



The regular network is protected from the sandnet via a firewall device. I’ve configured the firewall to log all traffic (logfiles are sent to a remote host via syslog for analysis).

BIND Configuration

BIND has been configured such that it is the SOA for every domain request that it receives, and replies to any requests with the IP address of the snservices host.
viktim has been configured to use snservices as its only DNS server. This allows any DNS calls being made by the viktim host to be observed, and any software that tries to communicate with the internet ends up talking to the snservices box instead.


the default zones for localhost and such have been snipped from the text below

   include "/etc/bind/named.conf.options";
   key "dnskey" {
      algorithm hmac-md5;
      secret "hash";
   controls {
      inet * allow {; } keys { "dnskey"; };
   zone "." IN {
      type master;
      file "/etc/bind/db.wildcard";




   $TTL   60M
   @   IN   SOA   localhost.  root.localhost (
                           2008022002   ; serial
                                604800  ; refresh
                                 86400  ; retry
                               2419200  ; expire
                                604800) ; negative cache ttl
                  IN          NS         localhost.
   *              IN           A




   options {
      directory "/var/cache/bind";
      allow-transfer { none; };
   //    logging {
   //       channel query_log {
   //          severity info;
   //          print-time yes;
   //          file "query.log" versions 5 size 50M;
   //       };
   //       category queries {
   //          query_log;
   //       }
   //   }
      listen-on-v6 { any; };




JavaScript De-Obfuscation

For de-obfuscating javascript downloader code there are a couple different methods which can be used. Info on this can be found at this SANS diary entry.
The SpiderMonkey method described there is fairly simple and generally works well. For this reason, SpiderMonkey has been installed and9 configured on the snservices host.

Monitoring and Logging


On snservices

For analyzing network communication from the viktim host, iptables logging can be used. Apache logs can be used to view HTTP requests. Additionally, a netcat listener can be established on whatever port the malware on viktim is attempting to connect to so the conversation can be monitored and/or logged.

On viktim

For analyzing the binaries and behaviors that occur upon infecting the Windows host, a combination of the following is typically used:

    • strings


    • wget


    • Wireshark


    • PEBrowse Pro


    • Immunity Debugger


    • SysAnalyzer


    • iDefense MAP


    • netstat


    • ipconfig


Virtual Machine Environment

For a quick analysis of things that don’t appear to require such a complicated setup, a virtual environment can be used.

VM Software

For the purposes of evading detection, the Innotek VirtualBox software was chosen. Most malware at the time of this writing does not check for this particular VM software when determining whether it is being run inside a virtual host (whereas a number of them do check for VMWare). VirtualBox also consumes less resources on the Host OS than VMWare. Both VMWare and Virtualbox are freely downloadable.

Guest OS

A Windows XP SP2 ( again, unpatched except for the SP ) image is run inside the VirtualBox Host.


The tools used in this environment are largely the same as those on the viktim host described above. Due to the fact that the snservices host is not available to the virtual machine however, some functionality is lost.
This is made up for somewhat by the functions provided by the SysAnalyzer tool and the iDefense MAP suite, however, these are not as robust as the tools available via the snservices host.
And there you have it. Maybe not the best sandnet ever made, but I find it fairly sufficient, and flexible enough to do what I need it to do and be easily maintained.
[update 2010-03-02]
I turned this into a talk and presented it at BlackHat DC 2010!

Handy Python Snippets

Obtaining the local IP address (

#!/usr/bin/env python
def getip():
 from socket import gethostbyaddr, gethostname
 theip = gethostbyaddr(gethostname())[2][0]
 return theip

Obtaining the local MAC address (

#!/usr/bin/env python
def getmac():
 import sys, os
 if sys.platform == 'win32':
  for line in os.popen("ipconfig /all"):
   if line.lstrip().startswith('Physical Address'):
    mac = line.split(':')[1].strip().replace('-',':')
  for line in os.popen("/sbin/ifconfig"):
   if line.find('Ether') > -1:
    mac = line.split()[4]
 return mac

Putting these together (

#!/usr/bin/env python
import getmac, getip
myip = getip.getip()
mymac = getmac.getmac()
print mymac + " has address: " + myip

About Disclosure

Let me start off by saying that I wish I had time to sit down and write this in a very concise, coherent manner. Unfortunately, I don’t, so instead of a well written post, here’s a rapid brain dump.
A couple of researchers (Robert E. Lee and Jack C. Louis) have recently been making a very large amount of press for discovering a new vulnerability in TCP. (see this blog post for a starting point).
The researchers are fairly well respected (among other things, they authored unicornscan, which is a tool that I am quite fond of).
Like Dan Kaminsky and the DNS fiasco not too long ago, they have decided to go with what a colleague of mine accurately referred to as “dribble disclosure”, that is, they’ve said there’s a problem, and they’ve given a large number of interviews giving out bits and pieces of what it may be, how they found it, etc. but they have not come out all the way and said precisely what the issue is.
However, unlike Dan Kaminsky, they’ve done this *before* any patching of any kind has been released. It was bad enough trying to deal with this type of disclosure *after* vendors had already had a chance to patch, trying to do it without that benefit is insane.
The problem with this type of disclosure is that it leads to a gigantic circus of FUD, both in the media and otherwise. For example, there’s some debate in various technical circles as to whether or not they have actually discovered anything new, or whether they’ve rediscovered older known issues.
I’m giving them the benefit of the doubt and presuming that they have in fact found something new, but without information, who knows? It’s all guesswork.
As for the media, I wish it was only the uninformed “mass” media that were spreading unrest and FUD, unfortunately even security researchers are contributing to the festivities.
For example, Robert Hansen (or RSnake as he is known) makes the following statement in his take:

I feel winter slowly coming, and it would be a shame if entire power grids could be taken offline with a few keystrokes, or if supply chains could be interrupted. I hear it gets awfully cold in Scandinavia.

Are you kidding me? We’ve gone from no details at all to suddenly power grids being knocked offline. Never mind the fact that it’s extremely unlikely (read: not gonna happen) that a device which controls the power grid of an area is directly connected to the internet. Devices that display power consumption/usage maybe, but not devices that control where that power is going and whether or not a given path is online.
Fyodor (of nmap fame) has posted his guess on the details of this new vulnerability (and an echo of my frustration at this type of disclosure as well), however Robert E. Lee replies that while Fyodor has very valid points and explains a bit of how their tool works, he doesn’t quite explain the attack they’ve found.
That’s one of the points of this rant: Smart people *are* going to figure out what the problem is. They may be “good guys”, or they may be “bad guys” (in my opinion it is likely that both sides will figure it out). Either way, there are certainly enough clues given in the various reports/podcasts to enable an individual that is clueful about the protocol to figure out a likely scenario.
To make matters worse, this time there are at least five unique vulnerabilities which have been documented by Robert and Jack. This of course increases the odds that the exploit will be found (that is, someone will figure at least one of the five out, if not all of them.)
So what really is the point of disclosing this way?
It isn’t helping anyone except the media and the researchers (because they get to revel in the media circus while it lasts).
More specifically:

    • It doesn’t protect end users


    • It doesn’t help administrators


    • It doesn’t even help security researchers other than those doing the dribbling, because rather than allowing one to try to find ways to fix the problem, or even new ways to apply the problem to other areas, it forces them to try to recreate what’s already been done using a disjointed trail of clues.

So, why do it this way?
Disclosure is simple really, either do it, or don’t.
Personally, I think “full disclosure” (eg. ‘do it’) is best.
Whether you do so before or after “responsible” vendor notification, I don’t care really. But get all the information out there when you do it, or keep your mouth shut until you’re ready to do so.
I’m disgusted with this “new way” of doing things, and I’ve decided to coin a term for this method: discloscharades
Just like the game charades, this “half informed” nonsense ends up making the person dribbling clues out look silly or worse, and it leaves the people doing the guesswork frustrated and annoyed.