I’ve been running https://sshpot.com/ for a while now - and decided it needed to be revamped and overhauled - and thought I’d make a presentation and write up some details on the process as well. If you’d like to just view the slides, hop over here.
If you’re just looking for the source code:
- Web application
- Honeypot Server
Table of Contents
- Design Goals
- Handling Requests
- Simulating Commands
- Persistence Layer
- Analyzing Dropped Files
I wanted to make sure this was an improvement over the previous iteration, so I laid out several goals for the rewrite:
- Appear more ‘vulnerable’
- Correlate commands/sessions (old version just logged data)
- Proxy requests and capture data
- Better statistics
- Redesigned command simulation using interfaces instead of simple string matching
Some important steps the honeypot must do:
- generate a new private key pair for the server on every boot (appear like a fresh server)
- Advertise a vulnerable version - Check past CVE’s if you want to target a specific one. Important the banner must start with SSH-2.0 or the client won’t handshake
- Must listen on port 22, so you should move the actual SSHD to port 2222 (or any other port, for example)
- Do the SSH handshake
- Handle requests
Appearing More Vulnerable
We need to create an SSH config for the
ssh package to use when constructing a new object. An important thing to note here is the
SSH-2.0 prefix to the
ServerVersion - if that is missing the client will do weird things (including not connecting).
1 2 3 4 5
Correlating Sessions and Commands
In order to correlate requests we can use the permission extension in the ssh package to store a map of data - in our case, a simple GUID to keep state across requests. This could also be used to store a user id or some other type of session identifier, for instance, if you were trying to write your own replacement ssh daemon to do things like serving up git requests.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
We want to capture as much metadata about the connection as possible, as well as capture public keys that are available when the attacker is using an ssh-agent - this can help us in the future to possibly identify bad actors. Here we marshal the public key and capture the key type being sent.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Proxying Requests and Capturing Data
Now we can talk about proxying requests. I’m going to throw some code at you then explain below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
On line 5 we’re going to read directly from the TCP connection, and only up to 1MB of data - if we get an EOF we’ll return. Next on line 12/13 we’re using a nice part of the http package that lets us take a raw stream of TCP bytes and convert it to the appropriate HTTP request that its asking for (like
GET /foobar) and handling all the other headers/post params.
After getting the TCP request into something we can work with more easily, we parse out any form params on line 19, and then we reconstruct the url to visit on line 24.
Line 26 is using our persistence struct to save everything that has come in so far.
Line 25 and line 54 can be interchanged. For my honeypots I’m actually making the raw requests that they’re asking for (only
GETs) - the other option is using the
httpHandler struct and create dummy responses for various websites. After we make the raw request we store the response in our persistence struct and save it to the API on line 46/47.
Finally on line 59 we close the channel to tell the client that data has been returned and is done.
The biggest portion of handling requests is accepting them and sending them off to be handled by the channel handler - which will be incoming commands and tcp connections. We loop to handle connections and perform the handshake for each new request.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
On line 18 we just log out of band requests that aren’t what we want. Line 20 handles the meat and potatoes of our programs which is incoming commands and SOCKS proxy requests. We do both of these in go routines so multiple clients can connect at once.
The next important step is handling both out of band requests and incoming TPC connections - jump to the end of the codelbock for an explanation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
On line 16 we’re sending the TCP reading off to another function to get handled - the rest of the requests coming in out-of-band will be handled in the function on line 22.
On line 24 we use the
terminal package to create a new terminal for reading input and sending output back to the user. Line 25 is our command handler which will do the regex and pattern matching.
Our case/switch statement is doing the heavy lifting here starting on line 39.
Now we need to understand the different types of SSH connections and requests that can be made:
direct-tcpipis what happens when you use your SSH connection to proxy TCP connections (like a SOCKS proxy).
execis what happens when you run commands like
ssh some user@host ‘ls -al’
shellis what happens when you actually login and start executing commands, a PTY gets launched and you have an interactive command prompt.
pty-reqlets the SSH client know that its ready to accept input (works in conjunction with shell).
These are all the command types we care about for now.
A few things we need to keep in mind for this part of the honeypot:
- Need to simulate commands that are run by attackers
- Go’s interface pattern fits well here
- Need to understand command return values
- Does the command return a new line?
- If output doesn’t match (including new lines) it may throw off bots/scripts that check for exact output matching
- Need to be able to match commands based on regex or equality
- Needs to handle commands like:
- echo -n test
- echo test
- echo foo bar baz
- Don’t want to write a handler for each variation of a command with flags - would never cover all cases
- Needs to handle commands like:
So with all that in mind lets lay out a framework for the command handler. We need something to register all of our commands, and then a structure for our commands to be run consistently.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
On lines 1 and 6 we’re creating our
CommandHandler which will be where all commands get registered to and where we store our terminal to write to.
Line 10 lets us register
1...n commands at once using go’s variadic argument syntax.
Line 16 is our runner that will take in the input from attacker and run it through our registered commands. We return the command output and also whether or not it will output a newline at the end. If no command is registered that will match, we return the generic bash command not found.
Line 25 is our interface definition. We have two functions,
Run follows the
MatchAndRun pattern where we returns the command output and whether or not a newline is needed. The meat of this command can do wahtever you want it to do - in this case we’re checking for some specific flags that I’ve seen used on the honeypot and parsing them out.
Match portion in this case is just a simple
Contains check - you can do whatever you want in this portion - it just needs to return a boolean. Go nuts with regexes or just do a simple equality check.
For our persistence layer their isn’t anything special going on. We have a few configurable options via ENV vars, ability to skip sending commands to the remote, and ability to provide a SERVER_URL to dev against the local rails application.
Now that we have a working honeypot that is able to accept logins, simulate and record commands, we can start analyzing dropped files.
Analyzing Dropped Files
Tools we’ll be using from OS X (please note this is not exhaustive):
- Docker (
docker-machineon OS X)
We’re going to use wireshark to process PCAP files generated by Wireshark, so we’ll need to tell VirtualBox to create the network capture:
We can exec into the container to check running processes and see other commands being run.
We’ll be using docker to dump a clean image and an infected image, in order to see what files were modified and/or dropped onto the file system:
What you want to do in order to get a clean a dump as possible:
- Run a plain ubuntu container
docker run -it ubuntu bashto get into it and at the console
- Run any commands you want (like
apt-get install wgetor whatever other tools you think you will need)
- Save the “comparison” container using
docker exportthen extract it into a folder
Now when running the malware (assuming its in the current directory):
Repeat the same commands you ran above in the “comparison” container (or save it as an image and spawn from that).
Now you can exec into your container and run the malware. Depending on how comfortable you are with monitoring your own network you should be doing this on an isolated network. Now to running the two malware samples we’ll be going over today:
Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a
- C&C trojan
- Drops multiple copies of itself
- Will randomly spawn processes and change process names
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
2 binaries were dropped onto the file system, which looked to be copies of itself:
- sha256: 3657bd42fef97343c78a199d3611285e6fe7b88cc91e127d17ebbbbb2fd2f292
- Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a
- sha256: acbccef76341af012bdf8f0022b302c08c72c6911631f98de8d9f694b3460e25
- Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a
Several other plain text files were dropped in order to ensure startup:
1 2 3 4 5
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
So we can see several things going on: dropped files, additional payload downloads, tries to persist itself in as many places as possible, masks its presence with different process names. These two pieces of malware were the first ones I’ve analyzed (ever) so I don’t have much indepth analysis other than what I was able to gather from network traffic and observations through docker. For a more detailed write up Kaspersky has a great writeup, including reversed source code.
DDOS.Flood / DnsAmp
Common name: DDOS.Flood / DnsAmp
- Initial payload is a shell script
- Another DDOS type malware
- Drops several binaries that cover as many architectures as possible (ARM/MIPS/etc)
- Masks itself as system interrupt process (irq)
- Uses IRC + HTTP for communication
- Connects to 2 IPs:
- Running on AWS, most likely compromised.
- Running somewhere in Australia.
- Initial payload is a bash script
1 2 3 4 5 6 7 8 9 10
1 2 3 4
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
I can see several things happening here, the most interesting I think is the multiple architecture support - perhaps trying to compromise routers and other smaller IoT devices that are running ARM or other mobile processors. It tries to mask itself as a pty process and then installs itself in crontab. Finally it does some communication over IRC in plain text. Based on the limited network communication I saw, I’m guessing this might belong to a Spanish hacker group (from the youtube link) - or it might just be a coincidence of what I saw while executing the malware.
Building the Honeypot Network
Finding cheap hosts is important, we don’t care about a lot of the niceties we’d normally want in a VPS provider or cloud provider (like DO or AWS) - what we want is a cheap isolated environment to run our honeypots in (either cheaper kvm/xen, or even cheaper openvz). To that end, serverbear.com is great for simple price comparison shopping.
I currently am still trying to find the best locations and providers to use but have a mix of OpenVZ and KVM instances running. The main OS is Ubuntu, however any flavor of linux will do since the go binary will be compatible on any of them.
And finally, in order to get the best representation of activity, it’s best to spread the servers out globally so you can get a wide geographic coverage (Europe, America, Asia, etc).
- Payload downloading
- Download the payloads as they get ‘executed’ on the honeypot
- Automate analysis
- Automate using docker and other tools to produce reliable analysis output
- More honeypots
- Right now I’m only running 6 - each is about $2-5/month
- Automate WHOIS lookups
- Automate abuse complaint sends and track which providers are actually monitoring and care about what their networ is used for
- More services
- Expand honeypot protocols to FTP, HTTP Proxies (Polipo, Squid, etc), etc