{ Josh Rendek }

<3 Ruby & Go

Kubernetes On Bare Metal

Apr 1, 2018 - 17 minutes

If you’ve been following kubernetes, you’ll understand theres a myriad of options available… I’ll cover a few of them briefly and why I didn’t choose them. Don’t know what Kubernetes is? Minikube is the best way to get going locally.

This guide will take you from nothing to a 2 node cluster, automatic SSL for deployed apps, a custom PVC/PV storage class using NFS, and a private docker registry. Helpful tips and bugs I ran into are sprinkled throughout their respective sections.

But first the goals for this cluster:
  • First-class SSL support with LetsEncrypt so we can easily deploy new apps with SSL using just annotations.
  • Bare metal for this conversation means a regular VM/VPS provider or a regular private provider like Proxmox with no special services - or actual hardware.
  • Not require anything fancy (like BIOS control)
  • Be reasonably priced (<$50/month)
  • Be reasonably production-y (this is for side projects, not a huge business critical app). Production-y for this case means a single master with backups being taken of the node.
  • Works with Ubuntu 16.04
  • Works on Vultr (and others like Digital Ocean - providers that are (mostly) generic VM hosts and don’t have specialized APIs and services like AWS/GCE)
  • I also reccomend making sure your VM provider supports a software defined firewall and a private network - however this is not a hard requirement.

Overview of Options
  • OpenShift: Owned by RedHat - uses its own special tooling around oc. Minimum requirements were to high for a small cluster. Pretty high vendor lockin.
  • KubeSpray: unstable. It used to work pretty consistently around 1.6 but when trying to spin up a 1.9 cluster and 1.10 cluster it was unable to finish. I am a fan of Ansible, and if you are as well, this is the project to follow I think.
  • Google Kubernetes Engine: Attempting to stay away from cloud-y providers so outside of the scope of this. If you want a managed offering and are okay with GKE pricing, this is the way to go.
  • AWS: Staying away from cloud-y providers. Cost is also a big factor here since this is a side-project cluster.
  • Tectonic: Requirements are to much for a small cloud provider/installation ( PXE boot setup, Matchbox, F5 LB ).
  • Kops: Only supports AWS and GCE.
  • Canonical Juju: Requires MAAS, attempted to use but kept getting errors around lxc. Seems to favor cloud provider deploys (AWS/GCE/Azure).
  • Kubicorn: No bare metal support, needs cloud provider APIs to work.
  • Rancher: Rancher is pretty awesome, unfortunately it’s incredibly easy to break the cluster and break things inside Rancher that make the cluster unstable. It does provide a very simple way to play with kubernetes on whatever platform you want.

… And the winner is… Kubeadm. It’s not in any incubator stages and is documented as one of the official ways to get a cluster setup.

Servers we’ll need:
  • $5 (+$5 for 50G block storage) - NFS Pod storage server ( 1 CPU / 1GB RAM / block storage )
  • $5 - 1 Master node ( 1 CPU / 1G RAM )
  • $20 - 1 Worker node ( 2 CPU / 4G RAM - you can choose what you want for this )
  • $5 - (optional) DB server - due to bugs I’ve ran into in production environments with docker, and various smart people saying not do it, and issues you can run into with file system consistency, I run a seperate DB server for my apps to connect to if they need it.

Total cost: $40.00

Base Worker + Master init-script
 1 #!/bin/sh
 2 apt-get update
 3 apt-get upgrade -y
 4 apt-get -y install python
 5 IP_ADDR=$(echo 10.99.0.$(ip route get 8.8.8.8 | awk '{print $NF; exit}' | cut -d. -f4))
 6 cat <<- EOF >> /etc/network/interfaces
 7 auto ens7
 8 iface ens7 inet static
 9     address $IP_ADDR
10     netmask 255.255.0.0
11     mtu 1450
12 EOF
13 ifup ens7
14 
15 apt-get install -y apt-transport-https
16 apt -y install docker.io
17 systemctl start docker
18 systemctl enable docker
19 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add
20 echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" >/etc/apt/sources.list.d/kubernetes.list
21 apt-get update
22 apt-get install -y kubelet kubeadm kubectl kubernetes-cni nfs-common
23 
24 reboot

Lines 2-13 will run on server boot up, install python (used so Ansible can connect and do things later), update and upgrade everything, and then add the private network address. Since Vultr gives you a true private network I’m cheating a bit and just using the last octect of the public IP to define my internal LAN IP.

Line 16 we’re installing the Ubuntu packaged version of docker – this is important. There are a lot of tools that don’t bundle the proper docker version to go along with their k8s installation and that can cause all kinds of issues, including everything not working due to version mismatches.

Lines 15-22 we’re installing the kubernetes repo tools for kubeadm and kubernetes itself.

Setting up the NFS Server

I’m not going to go in depth on setting an NFS server, there’s a million guides. I will however mention the exports section which I’ve kobbled together after a few experiments and reading OpenShift docs. There’s also a good amount of documentation if you want to go the CEPH storage route as well, however NFS was the simplest solution to get setup.

Remember to lock down your server with a firewall so everything is locked down except internal network traffic to your VMs.

/etc/exports

1 /srv/kubestorage 10.99.0.0/255.255.255.0(rw,no_root_squash,no_wdelay,no_subtree_check)

Export options:

  • no_root_squash - this shouldn’t be used for shared services, but if its for your own use and not accessed anywhere else this is fine. This lets the docker containers work with whatever user they’re booting as without conflicting with permissions on the NFS server.
  • no_subtree_check - prevents issues with files being open and renamed at the same time
  • no_wdelay - generally prevents NFS from trying to be smart about when to write, and forces it to write to the disk ASAP.

Setting up the master node

On the master node run kubeadm to init the cluster and start kubernetes services:

1 kubeadm init --allocate-node-cidrs=true --cluster-cidr=10.244.0.0/16

This will start the cluster and setup a pod network on 10.244.0.0/16 for internal pods to use.

Next you’ll notice that the node is in a NotReady state when you do a kubectl get nodes. We need to setup our worker node next.

You can either continue using kubectl on the master node or copy the config to your workstation (depending on how your network permissions are setup):

1 mkdir -p $HOME/.kube
2 cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
3 chown $(id -u):$(id -g) $HOME/.kube/config

Setting up the worker node

You’ll get a token command to run on workers from the previous step. However if you need to generate new tokens later on when you’re expanding your cluster, you can use kubeadm token list and kubeadm token create to get a new token.

Important Note: Your worker nodes Must have a unique hostname otherwise they will join the cluster and over-write each other (1st node will disappear and things will get rebalanced to the node you just joined). If this happens to you and you want to reset a node, you can run kubeadm reset to wipe that worker node.

Setting up pod networking (Flannel)

Back on the master node we can add our Flannel network overlay. This will let the pods reside on different worker nodes and communicate with eachother over internal DNS and IPs.

1 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
2 kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml

After a few seconds you should see output from kubectl get nodes similar to this (depending on hostnames):

1 [email protected]:~# kubectl get nodes
2 NAME           STATUS    ROLES     AGE       VERSION
3 k8s-master     Ready     master    4d        v1.10.0
4 k8s-worker     Ready     <none>    4d        v1.10.0

Deploying the Kubernetes Dashboard

If you need more thorough documentation, head on over to the dashboard repo. We’re going to follow a vanilla installation:

1 kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

Once that is installed you need to setup a ServiceAccount that can request tokens and use the dashboard, so save this to dashboard-user.yaml:

 1 apiVersion: v1
 2 kind: ServiceAccount
 3 metadata:
 4   name: admin-user
 5   namespace: kube-system
 6 ---
 7 apiVersion: rbac.authorization.k8s.io/v1beta1
 8 kind: ClusterRoleBinding
 9 metadata:
10   name: admin-user
11 roleRef:
12   apiGroup: rbac.authorization.k8s.io
13   kind: ClusterRole
14   name: cluster-admin
15 subjects:
16 - kind: ServiceAccount
17   name: admin-user
18   namespace: kube-system

and then apply it

1 kubectl apply -f dashboard-user.yaml

Next you’ll need to grab the service token for the dashbord authentication and fire up kube proxy:

1 kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep admin-user | cut -f1 -d ' ') | grep -E '^token' | cut -f2 -d':' | tr -d '\t'
2 kube proxy

Now you can access the dashboard at http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login.

Setting up our NFS storage class

When using a cloud provider you normally get a default storage class provided for you (like on GKE). With our bare metal installation if we want PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs) to work, we need to set up our own private storage class.

We’ll be using nfs-client from the incubator for this.

The best way to do this is to clone the repo and go to the nfs-client directory and edit the following files:

  • deploy/class.yaml: This is what your storage will be called in when setting up storage and from kubectl get sc:
1 apiVersion: storage.k8s.io/v1
2 kind: StorageClass
3 metadata:
4   name: managed-nfs-storage
5 provisioner: joshrendek.com/nfs # or choose another name, must match deployment's env PROVISIONER_NAME'
  • deploy/deployment.yaml: you must make sure your provisioner name matches here and that you have your NFS server IP set properly and the mount your exporting set properly.

Create a file called nfs-test.yaml:

 1 kind: PersistentVolumeClaim
 2 apiVersion: v1
 3 metadata:
 4   name: test-claim
 5   annotations:
 6     volume.beta.kubernetes.io/storage-class: "managed-nfs-storage"
 7 spec:
 8   accessModes:
 9     - ReadWriteMany
10   resources:
11     requests:
12       storage: 1Mi
13 ---
14 kind: Pod
15 apiVersion: v1
16 metadata:
17   name: test-pod
18 spec:
19   containers:
20   - name: test-pod
21     image: gcr.io/google_containers/busybox:1.24
22     command:
23       - "/bin/sh"
24     args:
25       - "-c"
26       - "touch /mnt/SUCCESS && exit 0 || exit 1"
27     volumeMounts:
28       - name: nfs-pvc
29         mountPath: "/mnt"
30   restartPolicy: "Never"
31   volumes:
32     - name: nfs-pvc
33       persistentVolumeClaim:
34         claimName: test-claim

Next just follow the repository instructions:

1 kubectl apply -f deploy/deployment.yaml
2 kubectl apply -f deploy/class.yaml
3 kubectl create -f deploy/auth/serviceaccount.yaml
4 kubectl create -f deploy/auth/clusterrole.yaml
5 kubectl create -f deploy/auth/clusterrolebinding.yaml
6 kubectl patch deployment nfs-client-provisioner -p '{"spec":{"template":{"spec":{"serviceAccount":"nfs-client-provisioner"}}}}'
7 kubectl patch storageclass managed-nfs-storage -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

This creates all the RBAC permissions, adds them to the deployment, and then sets the default storage class provider in your cluster. You should see something similar when running kubectl get sc now:

1 NAME                            PROVISIONER           AGE
2 managed-nfs-storage (default)   joshrendek.com/nfs   4d

Now lets test our deployment and check the NFS share for the SUCCESS file:

1 kubectl apply -f nfs-test.yaml

If everything is working, move on to the next sections, you’ve gotten NFS working! The only problem I ran into at this point was mis-typing my NFS Server IP. You can figure this out by doing a kubectl get events -w and watching the mount command output and trying to replicate it on the command line from a worker node.

Installing Helm

Up until this point we’ve just been using kubectl apply and kubectl create to install apps. We’ll be using helm to manage our applications and install things going forward for the most part.

If you don’t already have helm installed (and are on OSX): brew install kubernetes-helm, otherwise hop on over to the helm website for installation instructions.

First we’re going to create a helm-rbac.yaml:

 1 apiVersion: v1
 2 kind: ServiceAccount
 3 metadata:
 4   name: tiller
 5   namespace: kube-system
 6 ---
 7 kind: ClusterRoleBinding
 8 apiVersion: rbac.authorization.k8s.io/v1beta1
 9 metadata:
10   name: tiller-clusterrolebinding
11 subjects:
12 - kind: ServiceAccount
13   name: tiller
14   namespace: kube-system
15 roleRef:
16   kind: ClusterRole
17   name: cluster-admin
18   apiGroup: ""

Now we can apply everything:

1 kubectl create -f helm-rbac.yaml
2 kubectl create serviceaccount --namespace kube-system tiller
3 kubectl create clusterrolebinding tiller-cluster-rule --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
4 helm init --upgrade
5 kubectl patch deploy --namespace kube-system tiller-deploy -p '{"spec":{"template":{"spec":{"serviceAccount":"tiller"}}}}'

First we install the RBAC permissions, service accounts, and role bindings. Next we install helm and initalize tiller on the server. Tiller keeps track of which apps are deployed where and when they need updates. Finally we tell the tiller deployment about its new ServiceAccount.

You can verify things are working with a helm ls. Next we can install our first application, Heapster.

Important Helm Note: Helm is great, but sometimes it breaks. If your deployments/upgrades/deletes are hanging, try bouncing the tiller pod:

1 kubectl delete po -n kube-system -l name=tiller

Installing Heapster

Heapster provides in cluster metrics and health information:

1 helm install stable/heapster --name heapster --set rbac.create=true

You should see it installed with a helm ls.

Installing Traefik (LoadBalancer)

First lets create a traefik.yaml values file:

 1 serviceType: NodePort
 2 externalTrafficPolicy: Cluster
 3 replicas: 2
 4 cpuRequest: 10m
 5 memoryRequest: 20Mi
 6 cpuLimit: 100m
 7 memoryLimit: 30Mi
 8 debug:
 9   enabled: false
10 ssl:
11   enabled: true
12 acme:
13   enabled: true
14   email: [email protected]
15   staging: false
16   logging: true
17   challengeType: http-01
18   persistence:
19     enabled: true
20     annotations: {}
21     accessMode: ReadWriteOnce
22     size: 1Gi
23 dashboard:
24   enabled: true
25   domain: # YOUR DOMAIN HERE
26   service:
27     annotations:
28       kubernetes.io/ingress.class: traefik
29   auth:
30     basic:
31       admin: # FILL THIS IN WITH A HTPASSWD VALUE
32 gzip:
33   enabled: true
34 accessLogs:
35   enabled: false
36   ## Path to the access logs file. If not provided, Traefik defaults it to stdout.
37   # filePath: ""
38   format: common  # choices are: common, json
39 rbac:
40   enabled: true
41 ## Enable the /metrics endpoint, for now only supports prometheus
42 ## set to true to enable metric collection by prometheus
43 
44 deployment:
45   hostPort:
46     httpEnabled: true
47     httpsEnabled: true

Important things to note here are the hostPort setting - with multiple worker nodes this lets us specify multiple A records for some level of redundancy and binds them to the host ports of 80 and 443 so they can receive HTTP and HTTPS traffic. The other important setting is to use NodePort so we use the worker nodes IP to expose ourselves (normally in something like GKE or AWS we would be registering with an ELB, and that ELB would talk to our k8s cluster).

Now lets install traefik and the dashboard:

1 helm install stable/traefik --name traefik -f traefik.yaml --namespace kube-system

You can check the progress of this with kubectl get po -n kube-system -w. Once everything is registered you should be able to go https://traefik.sub.yourdomain.com and login to the dashboard with the basic auth you configured.

Private Docker Registry

Provided you got everything working in the previous step (HTTPS works and LetsEncrypt got automatically setup for your traefik dashboard) you can continue on.

First we’ll be making a registry.yaml file with our custom values and enable trefik for our ingress:

 1 replicaCount: 1
 2 ingress:
 3   enabled: true
 4   # Used to create an Ingress record.
 5   hosts:
 6     - registry.svc.bluescripts.net
 7   annotations:
 8     kubernetes.io/ingress.class: traefik
 9 persistence:
10   accessMode: 'ReadWriteOnce'
11   enabled: true
12   size: 10Gi
13   # storageClass: '-'
14 # set the type of filesystem to use: filesystem, s3
15 storage: filesystem
16 secrets:
17   haSharedSecret: ""
18   htpasswd: "YOUR_DOCKER_USERNAME:GENERATE_YOUR_OWN_HTPASSWD_FOR_HERE"

And putting it all together:

1 helm install -f registry.yaml --name registry stable/docker-registry

Provided all that worked, you should now be able to push and pull images and login to your registry at registry.sub.yourdomain.com

Configuring docker credentials (per namespace)

There are several ways you can set up docker auth (like ServiceAccounts) or ImagePullSecrets - I’m going to show the latter.

Take your docker config that should look something like this:

1 {
2         "auths": {
3                 "registry.sub.yourdomain.com": {
4                         "auth": "BASE64 ENCODED user:pass"
5                 }
6         }
7 }

and base64 encode that whole file/string. Make it all one line and then create a registry-creds.yaml file:

1 apiVersion: v1
2 kind: Secret
3 metadata:
4  name: regcred
5  namespace: your_app_namespace
6 data:
7  .dockerconfigjson: BASE64_ENCODED_CREDENTIALS
8 type: kubernetes.io/dockerconfigjson

Create your app namespace: kubectl create namespace your_app_namespace and apply it.

1 kubectl apply -f registry-creds.yaml

You can now delete this file (or encrypt it with GPG, etc) - just don’t commit it anywhere. Base64 encoding a string won’t protect your credentials.

You would then specify it in your helm delpoyment.yaml like:

1 spec:
2   replicas: {{ .Values.replicaCount }}
3   template:
4     metadata:
5       labels:
6         app: {{ template "fullname" . }}
7     spec:
8       imagePullSecrets:
9         - name: regcred

Deploying your own applications

I generally make a deployments folder then do a helm create app_name in there. You’ll want to edit the values.yaml file to match your docker image names and vars.

You’ll need to edit the templates/ingress.yaml file and make sure you have a traefik annotation:

1   annotations:
2     kubernetes.io/ingress.class: traefik

And finally here is an example deployment.yaml that has a few extra things from the default:

 1 apiVersion: extensions/v1beta1
 2 kind: Deployment
 3 metadata:
 4   name: {{ template "fullname" . }}
 5   labels:
 6     chart: "{{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}"
 7 spec:
 8   replicas: {{ .Values.replicaCount }}
 9   template:
10     metadata:
11       labels:
12         app: {{ template "fullname" . }}
13     spec:
14       imagePullSecrets:
15         - name: regcred
16       affinity:
17         podAntiAffinity:
18           requiredDuringSchedulingIgnoredDuringExecution:
19           - topologyKey: "kubernetes.io/hostname"
20             labelSelector:
21               matchLabels:
22                 app:  {{ template "fullname" . }}
23       containers:
24       - name: {{ .Chart.Name }}
25         image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
26         imagePullPolicy: {{ .Values.image.pullPolicy }}
27         ports:
28         - containerPort: {{ .Values.service.internalPort }}
29         livenessProbe:
30           httpGet:
31             path: /
32             port: {{ .Values.service.internalPort }}
33           initialDelaySeconds: 5
34           periodSeconds: 30
35           timeoutSeconds: 5
36         readinessProbe:
37           httpGet:
38             path: /
39             port: {{ .Values.service.internalPort }}
40           initialDelaySeconds: 5
41           timeoutSeconds: 5
42         resources:
43 {{ toYaml .Values.resources | indent 10 }}

On line 14-15 we’re specifying our registry credentials we created in the previous step.

Assuming a replica count >= 2, Lines 16-22 are telling kubernetes to schedule the pods on different worker nodes. This will prevent both web servers (for instance) from being put on the same node incase one of them crashes.

Lines 29-41 are going to depend on your app - if your server is slow to start up these values may not make sense and can cause your app to constantly go into a Running/ Error state and getting its containers reaped by the liveness checks.

And provided you just have configuration changes to try out (container is already built and in a registry), you can iterate locally:

1 helm upgrade your_app_name . -i --namespace your_app_name --wait --debug

Integrating with GitLab / CICD Pipelines

Here is a sample .gitlab-ci.yaml that you can use for deploying - this is building a small go binary for ifcfg.net:

 1 stages:
 2     - build
 3     - deploy
 4 
 5 variables:
 6   DOCKER_DRIVER: overlay2
 7   IMAGE: registry.sub.yourdomain.com/project/web
 8 
 9 services:
10 - docker:dind
11 
12 build:
13   image: docker:latest
14   stage: build
15   script:
16     - docker build -t $IMAGE:$CI_COMMIT_SHA .
17     - docker tag $IMAGE:$CI_COMMIT_SHA $IMAGE:latest
18     - docker push $IMAGE:$CI_COMMIT_SHA
19     - docker push $IMAGE:latest
20 
21 deploy:
22   image: ubuntu
23   stage: deploy
24   before_script:
25     - apt-get update && apt-get install -y curl
26     - curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get | bash
27   script:
28     - cd deployment/ifcfg
29     - KUBECONFIG=<(echo "$KUBECONFIG") helm upgrade ifcfg . -i --namespace ifcfg --wait --debug --set "image.tag=$CI_COMMIT_SHA"

Lines 16-19 are where we tag latest and our commited SHA and push both to our registry we made in the previous steps.

Line 26 is installing helm.

Line 29 is doing a few things. First thing to note is we have our ~/.kube/config file set as a environment variable in gitlab. The <(echo)... stuff is a little shell trick that makes it look like a file on the file system (that way we don’t have to write it out in a separate step). upgrade -i says to upgrade our app, and if it doesn’t exist yet, to install it. The last important bit is image.tag=$CI_COMMIT_SHA - this helps you setup for deploying tagged releases instead of always deploying the latest from your repository.

Thats it, you should now have an automated build pipeline setup for a project on your working kubernetes cluster.

comments powered by Disqus